Difference between revisions of "dictionary"

From Lojban
Jump to navigation Jump to search
m
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
 
This page is a summary of information about the Lojban Dictionary: It's history, current form, and future design.
 
This page is a summary of information about the Lojban Dictionary: It's history, current form, and future design.
  
 
===  Dictionary Software and Data ===
 
===  Dictionary Software and Data ===
 
+
* [http://jbovlaste.lojban.org jbovlaste] is the Lojban dictionary.
* [http://jbovlaste.lojban.org bovlaste] is the Lojban dictionary.
+
* [https://la-lojban.github.io/sutysisku/en/#sisku/mi_badna la sutysisku], a slick interface for la jbovlaste database with some algorithms improving search relevance.
* [http://www.lojban.org/cgi-bin/dict.pl ojban Online Dictionary Query]. The jbovlaste search engine.
+
* [http://www.lojban.org/cgi-bin/dict.pl Lojban Online Dictionary Query]. The jbovlaste search engine.
 
+
* [[Dictionaries, Glossers and parsers|Dictionaries, Glossers and parsers]] contains links to software for using a Lojban Dictionary.
* [[jbocre: Dictionaries, Glossers and parsers|Dictionaries, Glossers and parsers]] contains links to software for using a Lojban Dictionary.
+
* [[Word Lists|Word Lists]] contain dictionary information in a variety of formats.
* [[jbocre: Word Lists|Word Lists]] contain dictionary information in a variety of formats.
 
  
 
===  Dictionary Discussion ===
 
===  Dictionary Discussion ===
 
 
Several pages on the wiki discuss the Lojban Dictionary.  Those are collected here:
 
Several pages on the wiki discuss the Lojban Dictionary.  Those are collected here:
 
+
* [[The Lojban Dictionary|The Lojban Dictionary]]
* [[jbocre: The Lojban Dictionary|The Lojban Dictionary]]
+
* [[Great Dictionary Problem|Great Dictionary Problem]]
* [[jbocre: Great Dictionary Problem|Great Dictionary Problem]]
+
* [[dictionary notes from 2001]]
 
 
* [[jbocre: dictionary notes from 2001]]
 
 
* [[BPFK Section: Dictionary Preface]]
 
* [[BPFK Section: Dictionary Preface]]
 
+
* [[Old Draft Dictionary|Old Draft Dictionary]]
* [[jbocre: Old Draft Dictionary|Old Draft Dictionary]]
+
* [[Mini-dictionary|Mini-dictionary]]
* [[jbocre: Mini-dictionary|Mini-dictionary]]
+
* [[Mini-dictionary To-do|Mini-dictionary To-do]]
 
+
* [[Mini-dictionary paradigms|Mini-dictionary paradigms]]
* [[jbocre: Mini-dictionary To-do|Mini-dictionary To-do]]
 
* [[jbocre: Mini-dictionary paradigms|Mini-dictionary paradigms]]
 
 
 
 
===  Dictionary Design ===
 
===  Dictionary Design ===
 
+
This section is an exploration of the issue started in this discussion about the [http://groups.google.com/group/lojban/browse_thread/thread/ac00bb38bebef953 dictionary backend].
This section is an exploration of the issue started in this discussion about the [http://groups.google.com/group/lojban/browse_thread/thread/ac00bb38bebef953 ictionary backend].
 
  
 
====  What use cases will use Lojban dictionary data? ====
 
====  What use cases will use Lojban dictionary data? ====
 
 
What end-use will this data put put to?
 
What end-use will this data put put to?
 
 
* A print version of the Lojban dictionary
 
* A print version of the Lojban dictionary
 
* jbovlaste, the online dictionary.  This includes editing and voting on dictionary entries
 
* jbovlaste, the online dictionary.  This includes editing and voting on dictionary entries
 
 
* Flash card definitions, which are more concise.
 
* Flash card definitions, which are more concise.
 
* Interactive tools, like glossers and parsers  
 
* Interactive tools, like glossers and parsers  
 
 
** Ideally, jbofihe could use this data instead of having its own version.
 
** Ideally, jbofihe could use this data instead of having its own version.
 
** Similarly, the list of rafsi for jvocuhadju could be extracted herefrom
 
** Similarly, the list of rafsi for jvocuhadju could be extracted herefrom
  
====  What sorts of things *should* a Lojbanic dictionary store, ideally? ====
+
====  What sorts of things <u>should</u> a Lojbanic dictionary store, ideally? ====
 
 
 
* Are any of these use cases unsuitable for a dictionary?  Are there use cases we haven't thought of that are suitable?
 
* Are any of these use cases unsuitable for a dictionary?  Are there use cases we haven't thought of that are suitable?
 
* Can we separate a definition from its grammatical context?  Are we treating Lojban gismu too much like a verb or noun in the way we handle them now?
 
* Can we separate a definition from its grammatical context?  Are we treating Lojban gismu too much like a verb or noun in the way we handle them now?
 
+
* Should we use some manner of spatial visualization?  (e.g., [[cpacuvisualization]])
* Should we use some manner of spatial visualization?  (e.g., [[jbocre: cpacuvisualization|cpacuvisualization]])
 
 
* Versioning: it would be really really nice to be able to have BPFK proposals reference "checkpoint 123 in the dictionary software".  A possibility here: have an "official word" checkbox, and a button that says "checkpoint all the official words, give the checkpoint a number", and the ability to diff successive checkpoints.  Or possibly have the checkpoint button do everything, but be able to filter out the diffs to just show the official stuff, or not.  Then it could be of more general use: could also have a "this word is worth printing" button, and than any given official print run of the dictionary would be all those words, from checkpoint #123 or whatever.
 
* Versioning: it would be really really nice to be able to have BPFK proposals reference "checkpoint 123 in the dictionary software".  A possibility here: have an "official word" checkbox, and a button that says "checkpoint all the official words, give the checkpoint a number", and the ability to diff successive checkpoints.  Or possibly have the checkpoint button do everything, but be able to filter out the diffs to just show the official stuff, or not.  Then it could be of more general use: could also have a "this word is worth printing" button, and than any given official print run of the dictionary would be all those words, from checkpoint #123 or whatever.
 
 
** Related: You can't edit a definition with the "officially approved" and "official word" checkboxes. You can fork it, and then there's a button that outputs a diff between the fork and the officialdata version.  An admin could then move the "officially approved" checkbox around.
 
** Related: You can't edit a definition with the "officially approved" and "official word" checkboxes. You can fork it, and then there's a button that outputs a diff between the fork and the officialdata version.  An admin could then move the "officially approved" checkbox around.
  
Line 59: Line 43:
  
 
====  What prior art is there? ====
 
====  What prior art is there? ====
 
 
* Dictionaries are not a new problem, how do other people deal with this?
 
* Dictionaries are not a new problem, how do other people deal with this?
 
* Is Lojban fundamentally different because of its formal grammar?  Is our thinking on this problem influenced by working with languages that don't have a formal grammar: Is a dictionary a compromise that we have a better solution for?
 
* Is Lojban fundamentally different because of its formal grammar?  Is our thinking on this problem influenced by working with languages that don't have a formal grammar: Is a dictionary a compromise that we have a better solution for?
  
 
===  Design Proposal ===
 
===  Design Proposal ===
 
 
gismu, lujvo, and selma'o have different storage requirements.  The proposal below assumes they will be storted in a database, and describe the storage schema.
 
gismu, lujvo, and selma'o have different storage requirements.  The proposal below assumes they will be storted in a database, and describe the storage schema.
  
 
====  gismu ====
 
====  gismu ====
 
+
<tab class="wikitable">Field Description
{FANCYTABLE(head=" Field | Description ]]
+
gismu the gismu this row defines
 
+
lang the language this row is in
gismu     | the gismu this row defines
+
definition the definition of the gismu
 
+
x1 gloss for x1
lang       |  the language this row is in
+
x2 gloss for x2, NULL if there is no x2
 
+
x3 gloss for x3, NULL if there is no x3
definition | the definition of the gismu
+
x4 gloss for x4, NULL if there is no x4
 
+
x5 gloss for x5, NULL if there is no x5
x1         | gloss for x1
+
xN gloss for xN, NULL if there is no xN
 
+
</tab>
x2         | gloss for x2, NULL if there is no x2
 
 
 
x3         | gloss for x3, NULL if there is no x3
 
 
 
x4         | gloss for x4, NULL if there is no x4
 
 
 
x5         | gloss for x5, NULL if there is no x5
 
 
 
xN         | gloss for xN, NULL if there is no xN
 
 
 
{FANCYTABLE}
 
  
 
There is one entry in this table for each language/gismu.
 
There is one entry in this table for each language/gismu.
  
 
====  lujvo ====
 
====  lujvo ====
 
+
<tab class="wikitable">
{FANCYTABLE(head=" Field | Description ]]
+
Field Description
 
+
lujvo the lujvo this row defines
lujvo     | the lujvo this row defines
+
lang the language this row is in
 
+
definition the definition of the lujvo
lang       | the language this row is in
+
x1 gloss for x1
 
+
x2 gloss for x2, NULL if there is no x2
definition | the definition of the lujvo
+
x3 gloss for x3, NULL if there is no x3
 
+
x4 gloss for x4, NULL if there is no x4
x1         | gloss for x1
+
x5 gloss for x5, NULL if there is no x5
 
+
xN gloss for xN, NULL if there is no xN
x2         | gloss for x2, NULL if there is no x2
+
</tab>
 
 
x3         | gloss for x3, NULL if there is no x3
 
 
 
x4         | gloss for x4, NULL if there is no x4
 
 
 
x5         | gloss for x5, NULL if there is no x5
 
 
 
xN         | gloss for xN, NULL if there is no xN
 
 
 
{FANCYTABLE}
 
  
 
====  lujvo-component ====
 
====  lujvo-component ====
 
+
<tab class="wikitable">Field Description ]]
{FANCYTABLE(head=" Field | Description ]]
+
lujvo the lujvo this row defines
 
+
component the component gismu or lujvo that is one part of this lujvo
lujvo     | the lujvo this row defines
+
lang
 
+
x1 The position this component's x1 place appears in the lujvo.  NULL if it does not appear.
component | the component gismu or lujvo that is one part of this lujvo
+
x2 The position this component's x2 place appears in the lujvo.  NULL if it does not appear.
 
+
x3 The position this component's x3 place appears in the lujvo.  NULL if it does not appear.
lang       |
+
x4 The position this component's x4 place appears in the lujvo.  NULL if it does not appear.
 
+
x5 The position this component's x5 place appears in the lujvo.  NULL if it does not appear.
x1         | The position this component's x1 place appears in the lujvo.  NULL if it does not appear.
+
xN The position this component's xN place appears in the lujvo.  NULL if it does not appear.
 
+
</tab>
x2         | The position this component's x2 place appears in the lujvo.  NULL if it does not appear.
 
 
 
x3         | The position this component's x3 place appears in the lujvo.  NULL if it does not appear.
 
 
 
x4         | The position this component's x4 place appears in the lujvo.  NULL if it does not appear.
 
 
 
x5         | The position this component's x5 place appears in the lujvo.  NULL if it does not appear.
 
 
 
xN         | The position this component's xN place appears in the lujvo.  NULL if it does not appear.
 
 
 
{FANCYTABLE}
 
  
 
There is an entry in this table for each component of the lujvo.
 
There is an entry in this table for each component of the lujvo.
  
 
====  selma'o ====
 
====  selma'o ====
 
+
<tab class="wikitable">Field Description
{FANCYTABLE(head=" Field | Description ]]
+
selma'o the selma'o this row defines
 
+
lang the language this row is in
selma'o   | the selma'o this row defines
+
class the selma'o class of this selma'o
 
+
gloss the definition of this selma'o
lang       | the language this row is in
+
</tab>
 
 
class     | the selma'o class of this selma'o
 
 
 
gloss     | the definition of this selma'o
 
 
 
{FANCYTABLE}
 
  
 
====  Use Cases ====
 
====  Use Cases ====
 
 
How will this format be used in each of the use cases we've identified?
 
How will this format be used in each of the use cases we've identified?
  
Line 165: Line 108:
 
Assuming we're printing the English language translation dictionary, the gismu table will be rendered as the string:
 
Assuming we're printing the English language translation dictionary, the gismu table will be rendered as the string:
  
<verbatim>
+
<code>$gismu: $x1 $definition $x2 $x3 $x4 $x5 $xN</code>
 
 
$gismu: $x1 $definition $x2 $x3 $x4 $x5 $xN
 
 
 
</verbatim>
 
  
 
=====  jbovlaste, the online dictionary. =====
 
=====  jbovlaste, the online dictionary. =====
 
+
To be done.
<verbatim>
 
 
 
TBD.
 
 
 
</verbatim>
 
 
 
 
=====  Flash Cards =====
 
=====  Flash Cards =====
 +
To be done.
  
<verbatim>
+
[[Category:vlaste saske]][[Category:jboske]]
 
 
TBD.
 
 
 
</verbatim>
 

Latest revision as of 06:26, 21 May 2017

This page is a summary of information about the Lojban Dictionary: It's history, current form, and future design.

Dictionary Software and Data

Dictionary Discussion

Several pages on the wiki discuss the Lojban Dictionary. Those are collected here:

Dictionary Design

This section is an exploration of the issue started in this discussion about the dictionary backend.

What use cases will use Lojban dictionary data?

What end-use will this data put put to?

  • A print version of the Lojban dictionary
  • jbovlaste, the online dictionary. This includes editing and voting on dictionary entries
  • Flash card definitions, which are more concise.
  • Interactive tools, like glossers and parsers
    • Ideally, jbofihe could use this data instead of having its own version.
    • Similarly, the list of rafsi for jvocuhadju could be extracted herefrom

What sorts of things should a Lojbanic dictionary store, ideally?

  • Are any of these use cases unsuitable for a dictionary? Are there use cases we haven't thought of that are suitable?
  • Can we separate a definition from its grammatical context? Are we treating Lojban gismu too much like a verb or noun in the way we handle them now?
  • Should we use some manner of spatial visualization? (e.g., cpacuvisualization)
  • Versioning: it would be really really nice to be able to have BPFK proposals reference "checkpoint 123 in the dictionary software". A possibility here: have an "official word" checkbox, and a button that says "checkpoint all the official words, give the checkpoint a number", and the ability to diff successive checkpoints. Or possibly have the checkpoint button do everything, but be able to filter out the diffs to just show the official stuff, or not. Then it could be of more general use: could also have a "this word is worth printing" button, and than any given official print run of the dictionary would be all those words, from checkpoint #123 or whatever.
    • Related: You can't edit a definition with the "officially approved" and "official word" checkboxes. You can fork it, and then there's a button that outputs a diff between the fork and the officialdata version. An admin could then move the "officially approved" checkbox around.

What storage format is going to work for all of these use cases?

  • Each use of the dictionary data needs to view it in a different way. Can we design a format that can be shared between all users of the data?
  • Since the form of a dictionary entry is often unstructured text, What would a dictionary definition look like that supports all of our use cases without duplicating the definitions by changing their form? (e.g., a brief definition for a study card, a full definition for a print dictionary, and an archive discussion for the online dictionary?)

What prior art is there?

  • Dictionaries are not a new problem, how do other people deal with this?
  • Is Lojban fundamentally different because of its formal grammar? Is our thinking on this problem influenced by working with languages that don't have a formal grammar: Is a dictionary a compromise that we have a better solution for?

Design Proposal

gismu, lujvo, and selma'o have different storage requirements. The proposal below assumes they will be storted in a database, and describe the storage schema.

gismu

<tab class="wikitable">Field Description gismu the gismu this row defines lang the language this row is in definition the definition of the gismu x1 gloss for x1 x2 gloss for x2, NULL if there is no x2 x3 gloss for x3, NULL if there is no x3 x4 gloss for x4, NULL if there is no x4 x5 gloss for x5, NULL if there is no x5 xN gloss for xN, NULL if there is no xN </tab>

There is one entry in this table for each language/gismu.

lujvo

<tab class="wikitable"> Field Description lujvo the lujvo this row defines lang the language this row is in definition the definition of the lujvo x1 gloss for x1 x2 gloss for x2, NULL if there is no x2 x3 gloss for x3, NULL if there is no x3 x4 gloss for x4, NULL if there is no x4 x5 gloss for x5, NULL if there is no x5 xN gloss for xN, NULL if there is no xN </tab>

lujvo-component

<tab class="wikitable">Field Description ]] lujvo the lujvo this row defines component the component gismu or lujvo that is one part of this lujvo lang x1 The position this component's x1 place appears in the lujvo. NULL if it does not appear. x2 The position this component's x2 place appears in the lujvo. NULL if it does not appear. x3 The position this component's x3 place appears in the lujvo. NULL if it does not appear. x4 The position this component's x4 place appears in the lujvo. NULL if it does not appear. x5 The position this component's x5 place appears in the lujvo. NULL if it does not appear. xN The position this component's xN place appears in the lujvo. NULL if it does not appear. </tab>

There is an entry in this table for each component of the lujvo.

selma'o

<tab class="wikitable">Field Description selma'o the selma'o this row defines lang the language this row is in class the selma'o class of this selma'o gloss the definition of this selma'o </tab>

Use Cases

How will this format be used in each of the use cases we've identified?

A print version of the Lojban dictionary

Assuming we're printing the English language translation dictionary, the gismu table will be rendered as the string:

$gismu: $x1 $definition $x2 $x3 $x4 $x5 $xN

jbovlaste, the online dictionary.

To be done.

Flash Cards

To be done.