PEG: Difference between revisions
m (Gleki moved page jbocre: PEG to PEG without leaving a redirect: Text replace - "jbocre: ([A-Z])" to "$1") |
(glosser) |
||
(17 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
Parsing Expression grammars. Parsing expression grammars allow for full lookahead and backtracking in time linear to the input size. This makes them more expressive than [[YACC|YACC]] or [[BNF|BNF]], which are limited in the how far they look ahead, and as a consequence how far they can backtrack. They also require more memory than either YACC or BNF to parse an equivalently sized input. | |||
Parsing Expression grammars. Parsing expression grammars allow for full lookahead and backtracking in time linear to the input size. This makes them more expressive than [[ | |||
PEG grammars also do not have a separate lexing stage. Lexing and parsing are performed at the same time, using the same language for both. | PEG grammars also do not have a separate lexing stage. Lexing and parsing are performed at the same time, using the same language for both. | ||
See wikipedia for additional general information on [http://en.wikipedia.org/wiki/Parsing_expression_grammar | See wikipedia for additional general information on [http://en.wikipedia.org/wiki/Parsing_expression_grammar Parsing Expression Grammars]. | ||
== 4th Baseline Machine Grammar Proposal == | == 4th Baseline Machine Grammar Proposal == | ||
[[ | [[.alyn.post.|.alyn.post.]] is working on a proposal for a 4th Machine Grammar Baseline, replacing the 3rd baseline's [[YACC|YACC]] grammar with a PEG grammar. This work is scheduled for inclusion into [[Suggestions for CLL, second edition|CLL version 2.0]] or [[CLL Peg Errata|CLL version 2.1]]. | ||
=== Morphology === | === Morphology === | ||
Line 17: | Line 16: | ||
=== Technical Points of the PEG language === | === Technical Points of the PEG language === | ||
* The '.' characters means any character, in any character set. It is only used after fa'o, which unconditionally consumes the remaining characters. | * The '.' characters means any character, in any character set. It is only used after fa'o, which unconditionally consumes the remaining characters. | ||
* '!.' is the way EOF is tested in PEG. | * '!.' is the way EOF is tested in PEG. | ||
* space is defined as the literal '.' (as opposed to the '.' operator of PEG), whitespace, and all punctuation other than ',' and '''. | * space is defined as the literal '.' (as opposed to the '.' operator of PEG), whitespace, and all punctuation other than ',' and '''. | ||
=== CLL === | === CLL === | ||
The CLL will need to be updated to account for changes resulting from the translation of the grammar to PEG. | The CLL will need to be updated to account for changes resulting from the translation of the grammar to PEG. | ||
Main Article: [[ | Main Article: [[CLL PEG Errata|CLL PEG Errata]] | ||
=== Uncategorized Material === | === Uncategorized Material === | ||
* [[Practicable syntax changes|Practicable syntax changes]] | |||
* [[Exploiting the preparser|Exploiting the preparser]] | |||
** [[rant about exploiting the preparser|rant about "exploiting the preparser"]] | |||
== Open Discussion Points == | == Open Discussion Points == | ||
* jbogenturfa'i further transforms the PEG grammar into an idealized representation. This parse tree is suitable for programmatic manipulation. Why is this idealized parse tree not the way the PEG is written? | * jbogenturfa'i further transforms the PEG grammar into an idealized representation. This parse tree is suitable for programmatic manipulation. Why is this idealized parse tree not the way the PEG is written? | ||
== Lojban parsers that use PEG == | == Lojban parsers that use PEG == | ||
===Of the current official grammar=== | |||
* [[camxes|camxes]], the original PEG parser from which all others are based. | * [[camxes|camxes]], the original PEG parser from which all others are based. | ||
* [[ | * [[jbogenturfa'i|jbogenturfa'i]] | ||
* [[list of all variants of the PEG grammar#jbominji|jbominji]], John Leuner's PEG grammar. The [http://subvert-the-dominant-paradigm.net/~jbominji/code/lojban_grammar.peg ojban_grammar.peg grammar] is derived from camxes. | |||
* [http://mhagiwara.github.io/camxes.js/ mhagiwara's camxes.js] is Lojban Parser written in JavaScript based on [[camxes]]. | |||
* [[ | ===Of proposed grammars=== | ||
* [https://skami2.iocikun.jp/lojban/zasniGerna iocixes (dead link)] written in Haskell by la.iocikun. based on [[zasni gerna]] with [[MEX grammar proposal]] of la xorxes. ([https://skami2.iocikun.jp/lojban/zasniGernaPeg zasni gerna peg (dead link)]) | |||
* [http://lojban.github.io/ilmentufa/camxes-exp.html ilmentufa]: Created by la.ilmen. based on the [http://mhagiwara.github.io/camxes.js/ mhagiwara's camxes.js] with plenty of experimental propositions different from [[zasni gerna]] of la xorxes. ([http://lojban.github.io/ilmentufa/camxes-exp.js.peg ilmentufa's peg]) | |||
* [http://lojban.github.io/ilmentufa/glosser/glosser.htm Another frontend of ilmentufa]: Created by la.uilym. based on [http://lojban.github.io/ilmentufa/camxes-exp.html ilmentufa]. It is not necessarily based on the latest version of [http://lojban.github.io/ilmentufa/camxes-exp.js.peg ilmentufa's peg]. | |||
* [[zantufa]]: "zabna" parsers based on experimental grammars with clear versioning (suitable for use of [[jo'au]]), with many variations for various existing Lojban texts conforming to a previous official grammar (with [[BPFK Section: Ban on consonant-glide-vowel strings|CgV]]) like ''[[la .teris. po'u lo tirxu cu vitke zi'o le barda tcadu]]'' (suitable parser variation: [http://guskant.github.io/gerna_cipra/maltufa-1.1.html maltufa]), as well as those conforming to unofficial grammars like ''[[la .teris. ku noi tigra cu stuvi'e lo barda tcadu]]'' (suitable parser variation: [http://guskant.github.io/gerna_cipra/zantufa-1.1.html zantufa]), ''[http://selpahi.de/oz.html lo se mànci te màkfa pe la .oz.]'' (suitable parser variation: [[maftufa]]), ''[http://danmo-rozgu.github.io/kacatraloverba.html ka càtra lo vèrba]'' (suitable parser variation: [http://guskant.github.io/gerna_cipra/zantufa-1.1-cekitaujoibu.html zantufa cekitaujoibus]), etc. | |||
== See Also == | == See Also == | ||
* [http://www.teddyb.org/~rlpowell/hobbies/lojban/grammar/ Robin Powell's PEG Grammar Page]. This document builds on the work [[camgusmis|camgusmis]] and [[User:xorxes|xorxes]] have done, documented on this page. | |||
* [http://www.teddyb.org/~rlpowell/hobbies/lojban/grammar/ | * [[Grammar|Grammar]], for a discussion of Lojban's grammar beyond PEG. | ||
* [[ | * [[YACC|YACC]], the language in which Lojban's official grammar is defined. | ||
* [[ | |||
* [[BNF|BNF]], widely considered easier to read than the YACC grammar. | * [[BNF|BNF]], widely considered easier to read than the YACC grammar. |
Revision as of 06:32, 4 February 2016
Parsing Expression grammars. Parsing expression grammars allow for full lookahead and backtracking in time linear to the input size. This makes them more expressive than YACC or BNF, which are limited in the how far they look ahead, and as a consequence how far they can backtrack. They also require more memory than either YACC or BNF to parse an equivalently sized input.
PEG grammars also do not have a separate lexing stage. Lexing and parsing are performed at the same time, using the same language for both.
See wikipedia for additional general information on Parsing Expression Grammars.
4th Baseline Machine Grammar Proposal
.alyn.post. is working on a proposal for a 4th Machine Grammar Baseline, replacing the 3rd baseline's YACC grammar with a PEG grammar. This work is scheduled for inclusion into CLL version 2.0 or CLL version 2.1.
Morphology
Since PEG does not have a separate lexing stage, any PEG Machine Grammar will also need to express the Lojban Morphology in PEG.
Technical Points of the PEG language
- The '.' characters means any character, in any character set. It is only used after fa'o, which unconditionally consumes the remaining characters.
- '!.' is the way EOF is tested in PEG.
- space is defined as the literal '.' (as opposed to the '.' operator of PEG), whitespace, and all punctuation other than ',' and .
CLL
The CLL will need to be updated to account for changes resulting from the translation of the grammar to PEG.
Main Article: CLL PEG Errata
Uncategorized Material
Open Discussion Points
- jbogenturfa'i further transforms the PEG grammar into an idealized representation. This parse tree is suitable for programmatic manipulation. Why is this idealized parse tree not the way the PEG is written?
Lojban parsers that use PEG
Of the current official grammar
- camxes, the original PEG parser from which all others are based.
- jbogenturfa'i
- jbominji, John Leuner's PEG grammar. The ojban_grammar.peg grammar is derived from camxes.
- mhagiwara's camxes.js is Lojban Parser written in JavaScript based on camxes.
Of proposed grammars
- iocixes (dead link) written in Haskell by la.iocikun. based on zasni gerna with MEX grammar proposal of la xorxes. (zasni gerna peg (dead link))
- ilmentufa: Created by la.ilmen. based on the mhagiwara's camxes.js with plenty of experimental propositions different from zasni gerna of la xorxes. (ilmentufa's peg)
- Another frontend of ilmentufa: Created by la.uilym. based on ilmentufa. It is not necessarily based on the latest version of ilmentufa's peg.
- zantufa: "zabna" parsers based on experimental grammars with clear versioning (suitable for use of jo'au), with many variations for various existing Lojban texts conforming to a previous official grammar (with CgV) like la .teris. po'u lo tirxu cu vitke zi'o le barda tcadu (suitable parser variation: maltufa), as well as those conforming to unofficial grammars like la .teris. ku noi tigra cu stuvi'e lo barda tcadu (suitable parser variation: zantufa), lo se mànci te màkfa pe la .oz. (suitable parser variation: maftufa), ka càtra lo vèrba (suitable parser variation: zantufa cekitaujoibus), etc.
See Also
- Robin Powell's PEG Grammar Page. This document builds on the work camgusmis and xorxes have done, documented on this page.
- Grammar, for a discussion of Lojban's grammar beyond PEG.
- YACC, the language in which Lojban's official grammar is defined.
- BNF, widely considered easier to read than the YACC grammar.