The Crash Course: Dictionary

From Lojban
Revision as of 17:29, 29 March 2016 by Gleki (talk | contribs) (→‎Stage 2 info)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
lo vlaste poi simlu lo ka jmive je klina

The dictionary in 'The Crash Course in Lojban' project aims at presenting definitions of words in Simple English and simplified dialects of local languages together with their usage examples.

In particular in la jbovlaste it reuses Simple English direction for English speakers.

Simple English

Simple English doesn't have much common with Wikipedia's Simple English, Ogden's Basic English or Simple Technical English. It tries to present the dictionary in simple wordage and thus more usable for beginners in Lojban.

Changes from official definitions

  • In the majority of cases the order or places is the same as in official definitions so you don't have to relearn.
  • There are a few changes when the corpus of Lojbanic texts strongly suggests the need in a change and/or leads to misusage. Some of those changes are:
    • the third place of traji3 is taken from the fourth place
    • mabla, zabna, satci, konju change their place structure
  • Some rare places are ignored. Not using places is equivalent to zi'o or zo'e depending on context. If a place is used in a discourse next usage of the same selbri without this place assumes that this place is filled with zo'e=lo co'e.
    • As an example, the dictionary doesn't mention "by standard" places for most (but not all) definitions where they are present. Only less than 25 "by standard" places and 4 "under conditions" places retained. And sometimes they are rephrased to clarify their meaning.
  • Strictly specifying what lo sumti (noun) type can be accepted in each lo te sumti (place of a verb).
    • roles of lo te sumti are specified when necessary
    • If several sumti types are possible for the same place all of them are mentioned. No other sumti types is possible for this place. This allows better understanding of the place structure.
  • Clearly specifying lo te sumti interactions like in gua\spi language (e.g. "property of x2" place)

Is the list of words complete?

In no way. Some rarely used words aren't mentioned. This is especially applicable to words used for lerfu shifts, for mekso. The task is to formalize mostly used words supplemented by other words from the same semantic or syntactic series (even if the latter are rarely used).


Types of words:

  1. basic brivla: fu'ivla, cimjvo, gismu, cmevla
    1. cimjvo are considered non-decomposable.
    2. fu'ivla might get regularized suffixes (thus they would belong to compound brivla) specifying their semantic classes. Thus ideologically they are like rafsi-suffixes (idea by la selpa'i)
  2. compound brivla: jvajvo.
    1. jvajvo are taught separately from cimjvo
    2. in jvajvo CVV rafsi can act only as suffixes.
    3. CCV rafsi can go anywhere.
    4. CVC rafsi are deleted (retsku is no different from fu'ivla, but retskugau has the -gau suffix).

Te sumti types

  • clause: event (traditionally uses nu) or proposition (traditionally uses du'u).
  • entity
  • property
  • text
  • number

Additional orthogonal types

  • set of other types
  • ordered set of other types
  • realis/irrealis of a place by default. See Global subjunctivity

The dictionary specifies if set or ordered set of any sumti types or particular sumti types is possible for each place of the verb.

Raising, lowering rules. Te sumti types interactions

  1. no place can be both entity and clause (event/proposition/property)
  2. if we have the clause selbri and the clause sumti they match. No raising. mi djica lo se nitcu
  3. if we have the clause selbri and the entity sumti then raising is assumed. mi lerci, mi djica lo plise, lo mlatu cu melbi
  4. if we have the entity selbri and the clause sumti then lowering is assumed
  5. if we have the entity selbri and the entity sumti they match. No raising. lo mlatu cu citka

Stage 2 info

We assume that Stage 0 was publishing by LLG initial gismu.txt and cmavo.txt wordlists. The following Stage 1 of writing the Dictionary (link, most of the discussion is by xorxes and gleki) showed

A. which te sumti variable types should exist in Lojban

B. how they interact.

Here, at stage 2 we deal with the following tasks:

1. rethink variable types system based on drawbacks of the one from Stage 1. Find rules for resolving variable types conflicts (assigning a value of one type to a te sumti of another type; aka “sumti-raising/lowering” etc.)

2. polish out specifying te sumti interactions within every given brivla

3. rewrite definitions of most important cmavo (ignoring less used cmavo). ignore rarely used sumti based on “omitted sumti is zo'e ja zi'o, not zo'e” assumption, add useful place keywords (translating them as nouns or adjectives)

4. provide usage examples for EVERY SUMTI (not KOhA and not LA) of EVERY BRIVLA and for cmavo.

5. Using Google Spreadsheet formulae implement autogeneration of a print-ready dictionary from the spreadsheet. Make the spreadsheet more friendly for future translators of it to other languages.

Stage 2 result: one-page dictionary

Stage 2 explanation: The Crash Course: Dictionary

In detail:

1. “object” vs. “event” distinction didn't prove to be useful in brivla type system. It is gone. Instead, “entity” vs. “clause” is used which isn't strictly semantic. “Apple” is an “entity”. “Waterfall” is an “entity” even if it is described as lo nu lo djacu cu farlu. Thus philosophical issues of object/event/property distinction are avoided here.

1a. “Clause” is a te sumti type that can accept only an abstraction place. Conflicts are resolved as described on “The Crash Course: dictionary” page. Even if lo plise can be described as a motion of elementary particles and thus as a process, nevertheless mi djica lo plise can never mean “I want a process that we call 'apple' ”. This is because djica2 is of “clause” type and autocorrection according to the rule of “putting an entity sumti into a clause place” takes place. Thus it is assumed to mean mi djica lo nu lo plise cu co'e (this is the most common example of resolving type conflicts; this particular rule is otherwise known as “dealing with sumti-raising”). In particular, this together with entity/clause type system also solves the problem of implied raising in dunda2 as opposed to vecnu2.

1b. For pragmatic purposes other minor types are used. Among them are “proposition” (du'u-place), “property” (ka+ce'u place), “taxon”, “sound”, “text”, “number”, “cmavo class”. Orthogonal type is “plural”.

1c. No place can take more than one type. If you see that (e.g. it can be both a “property” and “entity”) then it means it can take only “property”, and “entity” is the result of sumti-raising. Example: mi cirko lo ckiku vs. mi cirko lo ka ce'u kanro.

2. the dictionary now explains how te sumti interact within te bridi array; this mostly happens via “ka+ce'u” places. If kau is assumed in a place it is mentioned. “nonce property” are places that have ce'u that refer to sumti that are not part of the place structure. E.g. in mi pensi lo ka ce'u broda the link ce'u refers neither to pensi1, nor to any other known place of pensi.

3. cmavo definitions have been rewritten according to common sense, removing cryptic words (as well as JCB's pseudo-English legacy). BPFK definitions from the tiki have been taken into account.

4. Anti-hermeneutics mechanism. Lojban is a lost language as shown by endless discussions in IRC and these mriste of what this or that word really means (a hermeneutics situation). Such discussions end either in “this is the most useful interpretation” or “this makes no sense”. What the authors of gismu places really meant can probably no longer be known. Here at Stage 2 for every place of every brivla usage example has been provided. Usage examples of te bridi array elements missing at time of Stage 2 were forced to appear. Korpora Zei Sisku tool and FrameNet, British National Corpus,, help from various Lojbanists here in the three mriste and in the IRC channel has helped a lot to complete this task.

4a. No place in usage example should be filled with KOhA or LA sumti - this is a requirement for an example to be successful. If this requirement is not fulfilled this might be an indication that such place can't be filled with anything else.

4b. Exception: “taxon” te sumti don't have usage examples since they mark names of taxons and thus la is applicable there (Lojbanized Linnaean names).

5. As of now the source is in a google spreadsheet, definitions are assembled from such pieces as “x1,”x2”,..., text between them, from type declarations of each place (e.g. “(entity)”). Examples and place keywords no matter how many of them are joined with their translations and attached to the definitions. Similarly, for cmavo. The result is then displayed on a separate list in a mediawiki-friendly format so that it can be easily pasted to a mediawiki page as shown in the link above.

5a. Luckily, no macros/scripts are needed. Embedded default spreadsheet functions are enough. CONCATENATE, IF, OR, VLOOKUP, REGEXREPLACE are among most frequent functions generating the Dictionary.

5b. A special URL can be generated showing the latest version of the Dictionary.

Future work

This dictionary isn't an official project, it is a trade-off between official wordlists, CLL, later BPFK work, IRC community live usage, and the level of coverage of the semantic space.

01. For 99% of the language we now have at least one opinion so that any clarification or a rival opinion on a given usage example, any te bridi array element, glosswords, definition etc. can now be listened to and pushed into the dictionary.

02. New output formats can now be suggested apart from mediawiki, e.g. latex. Improvements to the existing output can now be suggested.

See also