corpora: Difference between revisions

From Lojban
Jump to navigation Jump to search
mNo edit summary
m (Text replace - "jbocre: b" to "b")
Line 1: Line 1:


Here is some info that we hope will be useful to [[jbocre: baupla fuzykamni PFK|baupla fuzykamni PFK]] commissioners, and other people doing research on the Lojban language.
Here is some info that we hope will be useful to [[baupla fuzykamni PFK|baupla fuzykamni PFK]] commissioners, and other people doing research on the Lojban language.


There is now a master [http://lojban.org/cgi-bin/corpus orpus Application]!
There is now a master [http://lojban.org/cgi-bin/corpus orpus Application]!

Revision as of 12:16, 23 March 2014

Here is some info that we hope will be useful to baupla fuzykamni PFK commissioners, and other people doing research on the Lojban language.

There is now a master orpus Application!

Use it, and if it's missing something, please add it.

The two alternatives above will often give so many false positives in English so as to be useless. The main source we have for Lojban usage is the IRC logs:

These are filtered line-by-line to exclude lines that have too many words that are not possible Lojban word-forms, so it is a very high-quality corpus, and consists of more than 360,000 words (as of February 12th, 2006).

Lojbab's old archives are at [1]

We also have some contributed texts that were uploaded to the old Twiki, and (to tsali y knowledge) not available elsewhere:

  • {ATTACH(name=>birendra,showdesc=>1)}{ATTACH}
  • {ATTACH(name=>bisli-viltcima,showdesc=>1)}{ATTACH}
  • {ATTACH(name=>help-lojban.txt,showdesc=>1)}{ATTACH}