annotated machine grammar

From Lojban
Revision as of 16:43, 4 November 2013 by Gleki (talk | contribs)
Jump to navigation Jump to search


According to a new BPFK policy proposal, which is likely to be

adopted, the BPFK has to review ALL the machine grammar rules.

The current machine grammar is written mostly in YACC, with a few post- and pre-processing rules written in English. Some of these are difficult to formalize, as such, implementation of the grammar (such as the official parser and jbofi'e give different results, and none of them implement the whole language.

Robin Lee Powell is currently working on a machine grammar in Parsing Expression Grammar (see is web page), which is expected to eventually be blessed by the BPFK as the definitive grammar, when it has been thoroughly debugged.

This collection of pages looks at the current (ie. YACC) rules. Most of them are carried over to the PEG grammar, though.

tsali'm going to take the easy ones first; I

don't expect there to be any issues with them. I'm working from

grammar.300, which is the same as the one in the CLL, barring typos.

When I started this, I was unaware of the echfix comments. These might be of help in understanding the rationale of the rules.

Non-terminals (phrases)

Specific kinds of non-terminals

These are non-terminals that are so similar that it makes sense

discussing them collectively.

Terminals (tokens)

The machine grammar of Lojban is not purely LALR(1). Some constructs need to be modified by a program before passing it on to the actual YACC parser. This program is referred to as the lexer in literature about the machine grammar, but it does a lot more than simply lexing. There are two kinds of modifications the lexer can do to its input:

  • Replace it with a pseudo-token ("lexer token")
  • Insert a lexer token in front of it

An example where this is done, is with utterance ordinals, that consists of a letteral or number string, followed by mai. The lexer detects that such a string is followed by mai, and inserts jbocre: lexer_A_701 in front of it. Thus, the "real" parser sees the resulting construct about the same way as it sees a parenthesis, with an introducing particle, a contained phrase, and a terminator. Needless to say, the conceptual "terminator" of the utterance ordinal, mai, is not elidable, because that is the word that the lexer has to detect to insert the lexer token in the first place.

The lexer tokens and the preparsing process is not as well understood as the rest of the YACC grammar. In particular, it is not certain if they interact with each other. This project is trying to remedy this.