lojbanMachineTranslation: Difference between revisions

From Lojban
Jump to navigation Jump to search
mNo edit summary
 
mNo edit summary
Line 1: Line 1:


=== These are ''resolved'' errata in the Level 0 book, chapter 4 and onward. ===
* Nicholas, N. 1996a. [http://citeseer.nj.nec.com/27803.html ojban as a Machine Translation Interlanguage in the Pacific]. Fourth Pacific Rim International Conference on Artificial Intelligence: Workshop on 'Future Issues for Multilingual Text Processing', Cairns, Australia, 27 August 1996. 31-39.
* pycyn pointed out that work has been performed on conversion form Logician's English to predicate logic. See the [http://groups.yahoo.com/group/lojban/message/12624 rovided references].


* Chapter 4
* And suggested [http://www.google.com/search?q=discourse+representation+theory iscourse Representation Theory] as being relevent.
** Lingistic Issues *with* Lojban?  I think?  -Robin
* Bjorn Gohla pointed out [http://www.darmstadt.gmd.de/publish/komet/kpml-1-doc/kpml.html PML] a natural language text generation system.


*** I'd have issues with that, :-) Other votes?
** [[jbocre: Jay Kominek|Jay]] think that translating Lojban into other languages is almost purely a [http://www.dynamicmultimedia.com.au/siggen/ L text generation] problem. ([[jbocre: Jay Kominek ay|(Jay Kominek ay]] also feels that translating natural languages into Lojban is an uninteresting problem, FWIW.)
**** Maybe "Linguistic Issues Pertaining to Lojban" -- Adam
***An uninteresting problem?? Well, let's get some skeleton code running then if it's so easy! Because it's a very useful project!


***** Works for me. -Robin
****Come on, xod, we're supposed to be thinking logically, aren't we? ''le'i ro cinri na cmima le'i frili'' The kind of reasoning you're demonstrating is the kind of thing Lojban ought to be wonderful at putting a stop to. :) --[[jbocre: Jay Kominek|Jay]]
***** zo'o .i vu'e ma'i lo'i skami nabmi le sizytolcinri ca'a smuni lo nalpluja


*** <hr> between the questions. -Robin
*** What quality of translation is uninteresting? A Babelfish-quality machine translator would be useful; is that what you consider dull? A Stefan George-quality translator would be incredible (and far beyond the state of the art!).
** Question 1
**** Any quality is uninteresting. I don't see taking massive amounts of natlang text and moving it into Lojban as useful. What I do see as useful is writing new things (patents, manuals, etc) in Lojban, and then being able to get quality translations of the Lojban, in ''n'' other languages. (Nobody will learn Lojban to read things which are already available in a natural language, but they might learn it to that they can write things that can be translated into ''n'' languages.) --[[jbocre: Jay Kominek|Jay]]


*** "nderstanding the potential for Sapir-Whorf effects may lead to better inter-cultural understanding, promoting communication and peace."  -- Aaawww, what a cute fwuffy widdew bunny!-Robin
***** Nobody needs to learn Lojban if converting into and out of Lojban is so easy. Sorry, but if Lojban is used as an interlingua, it will be less like a Lingua Franca, spoken by many people to each other, and more like like a hidden inter-translation code that few ever care to see.As far as I am concerned, Natlang --> Lojban is hard, Lojban --> Natlang is easy. So, you can see my surprise at the allegation that the former is easy too! If it's all so easy, let's just do it. --xod
**** I wanted this out, but John overruled me...
****** You're living in an alternate reality, because nobody has said the that natlang -> lojban is easy, and asserting it more often won't make it true. --jay


** Question 2
******* Ignoring my direct addressing of this point doesn't help matters. Read the sentence I wrote above in Lojban. --xod
*** "Ambiguity can be judged on four levels: the phonological-graphical, the morphological, the syntactic, and the semantic.-- Sounds wrong.  syntactical and semantical?  -Robin
******** I did read it. That is a perfectly valid assumption to make, right until you're corrected. When you persist in holding a view which is valid in reference frame A, after it has been pointed out that you're not in reference frame A, well, see the above "alternate realities" comment. --jay


**** And yet, that's completely correct. semantical does not exist, and syntactical is rare. Likewise, morphologic and phonologic do not exist. As I've always maintained, English is a whore... (You certainly can't pin this on Greek, because the -al bit is Latin; it took English to smoosh them together like this.)
********* The reference frame is skami nabmi. We are discussing a issue of software complexity. Where is the disconnect, and why do you think it's been explained to me even once? --xod
*** "although pauses can be unambiguously identified in written text from the morphological rules alone."  -- although *required* pauses.  -Robin


**** Don't quite get it, but am putting in anyway.
* [http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html] - a 1996 overview of the state of the art in natural language processing
*** "produces a unique parse for every Lojban text." -- produces a unique parse for all strictly correct/valid Lojban text.  Or something like that. -Robin


**** for every Lojban text that follows its grammatical norms.
----
** Question 4


*** ''nanmu, meaning 'adult human male'. '' ''nanmu'' just means ''human male''. The gismu list says that it is not necessarily adult.
I'm pretty knowledgeable about artificial intelligence, though I've never worried much about natural language understanding specifically. In my opinion, the issue of whether translation to or from Lojban is "easier" is secondary. Before you can answer it, you have to ask: What quality of translation?
** Question 10


*** ''lenu do jacysabji da cu nibli lenu da ba banro''. That should probably be ''rinka'' instead of ''nibli'', and the scope of each ''da'' ends when the subsentence it is a part of ends, and so both ''da''s could legitimately refer to different things, without a ''(ro)da zo'u'' before the sentence.
* There are large chunks of the translation process which Lojban makes easier, and the only thing about Lojban which would make the process more difficult is the fact that you can't rely on natural language vagueness and iffiness to carry the day. (And really, you shouldn't let it ever do so, but people want results...) --jay
**** Well spotted. That's why we need a ''Lojban for Intermediates''...


** Question 15
For machine translation quality similar to the current state of the art--poor--I expect Lojban would be much easier to translate from and somewhat easier to translate to. Lojban provides unambiguous parses and often-unambiguous word meanings, which are basic abilities that today's machine translators have trouble with.
*** The question claims that Lojban has ''non-English diphthongs''. I don't think that Lojban actually has any of those.


**** Loglan holdover. Struck.
* "Trouble with"? They're incapable of it. For some languages, its provably impossible. (without true understanding of the context. see Swiss German) --jay
*** ''Medial consonant clusters are also restricted, to prevent ... consecutive stops.'' Consecutive stops are in no way prohibited in Lojban.
** "Incapable" is too strong a term. Machine translators can use statistical models to make guesses at word senses. It's not on the same planet as throwing darts. ''mi'e [[jbocre: jezrax|jezrax]]''


**** Loglan holdover. Struck.
*** I was referring to parsing the grammar, actually. Swiss German is provably context sensitive, so you'll need to understand it before you can even hope to parse it. As far word sense, well, if you've got an algorithmic process for even guessing at the meaning of words in natural language, I suggest you publish. :) Otherwise, you'll have to define for the system every word you want it to be able to translate. (Whereas in Lojban, that is limited to the so-far-little-used fu'ivla) (The best statistical model I've seen which would be applied to determining word sense from nothing is latent semantic analysis, and that requires a very very big corpus, and acts in odd, unpredictable ways: hot and cold are "closer" to it than are cold and cool.)
*** "How can a language be appropriate as an international auxiliary language when it is difficult to pronounce?"  -- I notice that "Who the hell said lojban was an int aux lang?" doesn't appear in the answer. 8)  -Robin
**** ''zo'o'' No need to publish; there are already enough book chapters on it. Search Amazon for "machine translation" or "natural language processing". It's a decades-old research field; people know what the problems are; none of them are solved, but there's progress on every front. One specific suggestion: "Foundations of Statistical Natural Language Processing" [http://www.amazon.com/exec/obidos/ASIN/0262133601/qid=1009761097/sr=1-2/ref=sr_1_75_2/103-0198688-3778269


**** Again, holdover from the 1969 discussion of Loglan. Any mention of auxiliary language is joyfully struck (exorcising my esperantic demons), and supplanted with 'culturally neutral', which is the real point of any such questions.
****] The ambiguous-parse problem occurs in, probably, all natural languages. I gather that the most popular way to deal with it is by brute force: produce all possible parses (usually a lot more than you expect), and then rank them. As far as I'm concerned, this is an easy problem from among the problems of natural language understanding--but only relatively speaking!
** Question 16


*** "The Logical Language Group has proposed formal tests of the algorithm, and is instrumenting its software used for teaching vocabulary to allow data to be gathered that will confirm or refute this hypothesis" -- Long since has done, I think. -Robin
For machine translation quality similar to a quick-and-dirty human translation--moderate quality but still much better than the current state of the art--I doubt Lojban offers much advantage. The problems are so much more difficult that merely getting the syntax and individual words correct doesn't go that far toward solving them.
**** and has instrumented its software used for teaching vocabulary to allow data to be gathered that can confirm or refute this hypothesis.


** Question 22
* What problems (that don't already exist in dealing with natural language)? jbofi'e already performs "quick-and-dirty" translation, and the results would be decent with smoothing and some knowledge of the destination language applied (getting subject/verb count agreement and such things to match). --jay
*** "In a highly complex system (which any language, even an artificial one, is),"  -- Even a well-developed artificial one, anyways. -Robin
** ''[[jbocre: jbofi'e|jbofi'e]]'''s translation quality is way, way worse than a quick-and-dirty human translation.


**** Any language is a highly complex system?even an artificial language, as long as it is non-trivial. (This certainly holds true for Lojban!) In such a system, the interaction of the design features displays properties that are more than the sum of its parts.
*** You'll need to define "quick and dirty", then, as I interpret it to mean dictionary lookup of each individual word, limited attempts to deal with conjugation, and simplistic reordering to match the order of subject, verb and object in the destination language. jbofi'e definitely beats that out.
*** "insights that would then be tested in the natural languages." -- s/in the/in/ -Robin
**** A "quick-and-dirty human" translation is, say, one done in real time by a simultaneous translator.


** Question 23
Obviously, the higher-quality the machine translation, the more uses it has. I doubt that a poor-quality translation would be adequate for the proposed patent application, though I could be wrong. Also, the patent application could rely on formalizing the source text according to special-purpose rules, which would make the job easier.
*** "we don't know what features of a language might be determining to a culture."  -- I fail to understand this sentence, and I don't know if it's the sentence or me. -Robin


**** Oddly enough, it looks to me like a calque from Greek, but I'm fairly sure it's John's. I'll make it "decisive".
* A poor translation that takes 2 seconds could be worth something compared to a good translation which would take a week or so and cost you a bit of money. (As far as limiting the domain, see the METEO system used by Canada to do translation of weather reports, works flawlessly.) --jay
** Question 24
** Of course; it all depends on the use.


*** Your math is dubious, as you never theorize how long it takes to learn a 4th language.
''mi'e [[jbocre: jezrax|jezrax]]''
**** Am querying John.
 
***** Saith John, it should probably be worded more like this:  "Assume that you can learn a second language in four years, and further languages in two more years each.  If you can learn an artificial language (to the same degree) in only one year due to its greater simplicity and regularity, then you can save a year by learning the artificial language first and then spending only two years on further natural languages, even if you never use the artificial language again.
***** But while this argument may have some merit for E-o, it's complete shite for Lojban, [[jbocre: John Cowan MO|John Cowan MO]].  I would be in favor of dropping this question altogether.
 
***** Heh.  Until someone does a study proving it wrong...  8)  --RobinLeePowell
***The editors have spoken. I've commented the question out, and John is right that Lojban is a little too alien to serve as a good 'propaedeutic' (pre-teaching tool). We'll wait on the stats; Lojban propaganda doesn't really need more Gedanken, and this is an angle the Esperantists have been long pushing, with noone visible in Lojban to push back; it's too risky a move politically. Out it goes. -- n
 
** Quesiton 25
*** ''zdane'ikemcmafagyso'ikemprununje'a'': ''je'a'' is a rafsi for ''jecta'', not ''jelca''. -- Adam
 
**** Middle Lojban holdover. Struck.
 
** Question 5:  s/comparision/comparison/  -ScottW
** Question 13: s/necesssary/necessary/    -ScottW
 
** Question 16: s/occurence/occurrence/ (or is this a Commonwealth spelling?) -ScottW
** Question 19: s/sufficent/sufficient/    -ScottW
 
*** [[jbocre: Moved from below because I have a comment. Move it back afterwards.|Moved from below because I have a comment. Move it back afterwards.]]  "Many Amerindian languages use these type of words." -- Native American, please.  Indians are from India.  -Robin
**** Though Amerindian is utterly standard terminology in linguistics, this document is not intended for linguists; reluctantly changed.
 
***** "Native American" refers to anyone born in the Americas, no matter what their ancestry. Pragmatically, it refers to American aboriginals, but then pragmatically so does "Indian" and "American Indian". "Amerindian" is never used to refer to anything but American aboriginals, and such a contraction isn't a possible way to refer to an American of Bharat ancestry or origin in English, and so especially if that's the standard term in linguistics, I think that's what should be used. Certainly no non-linguist will be confused or even surprised by the usage of "Amerindian". (Also under Grammar->attitudinals) -- Adam
***** Overruled; the denotation of 'Native American' is also clear, and less obviously jargonish than 'Amerindian'. As proven by Robin. :-)
 
* Chapter 7
** The title should be ''me le cipnrkorvo .e le lorxu''
 
* Chapter 8
** ''.i sei le selfu cucusku se'u vi'o'' Space missing between ''cu'' and ''cusku''. -- [[jbocre: Adam|Adam]]
 
* Appendix A
**Spanish ''g  bato (plosive, not fricative)'' It should be '''g'''ato.
 
** (Note: these nits apply to the PDF version bearing file date 2002-12-03 15:56 +0100)
*** Talen bless you, filip. (for surely it is you).
 
**** Indeed it is (sorry for not signing my name to it) -- mi'e [[jbocre: pne|.filip.]]
** The Lojban letters in Chinese (the first column) inconsistently uses the "Lojban" (monospace) font -- most letters are in a Times-like font.
 
*** Word got grievously confused by the alternation of Chinese and monospace in the generated RTF, and after three sleepless nights of trying to get it to work (Apple 16/600, Substitute Device Fonts, Optimize for Speed, Outline Fonts, Postscript Level 1), I didn't feel like tidying the document any more. I'll more likely remember to next time. :-)
** The first letter for "Hindi & Urdu" ("a") is also in a Times-like font, rather than in the "Lojban" font.
 
*** Same problem.
** The example words for Arabic, Chinese, and Hindi/Urdu do not use boldface to indicate which sound of the example word exemplifies the pronunciation of the Lojban letter
 
*** They were intended to, but the font I've used for the transliterations (Gentium) doesn't come in bold; not many good freeware Unicode fonts do. The deal breaker is Latin Extended Additional, that I've been using for Arabic. I have two choices: underline Gentium (which will look revolting), or have a less pedantic Arabic transliteration. I will probably go with the latter.
** Some of the Chinese example words don't carry accents (for example, Datong, fei long wu feng, pinyin, Aol�ng (first syllable), Tang, hen hao, Feizhou, Meiguo, Beijing, Yazhou, Yingguo, wu ye, Puyi, Luoma.
 
*** aulun gave them without accents here, so that's how I put them. If anyone knows the accents, let me know them, because it's hard enough for me to navigate Hindi dictionaries...
**** I don't think it's necessary with well-known proper names or phrases (nor, in general, providing pinyin tone marks along with the characters!), but okay: Da4tong2, fei1 long2 wu3 feng4 (the dragon's flying and the phoenix's dancing), Ao4long2, Tang2, hen3 hao3, Fei1zhou1, Mei3guo2, Bei3jing1, Ya4zhou1, Ying1guo2, wu3ye4 (if I remember it right: midnight?), Pu3yi2 (the last emperor!), Luo2ma3 (Rome) -- coi mi'e .aulun.
 
**** I'll see what [[jbocre: pne|pne]] can do. No promises, though.
** Does alif-waw-heh really spell "vah" in Urdu? I would have guessed that the initial alif should go.
 
*** Oops. Should have been waw alef heh.
**** That makes more sense to [[jbocre: pn|ne]].
 
** Is "el-kor'a:nu" supposed to be "the Koran"? I'm pretty sure that's "al-Qur'a:n", spelled alif-lam-qaf-ra-(alif with madda)-nun. That is, with qaf not kaf and with alif-with-madda (looks like a tilde above) rather than two alifs in a row.
*** The textbook must have spelt it like that to emphasise the hiatus. Spelling normalised per yours; no idea where the k came from...
 
** You use "d-with-underline" to transliterate both dhal (in "ha:da:") and tha (in "kadira"). Is the ''tha'' in the latter word a typo, or the transliteration ''d-with-underline''?
*** The latter
 
** Should the Arabic for "giddan/jiddan" use fathatan (two high slanted lines representing "an") before the final alif?
** Should "has.s.atan" use fathatan at the end?
 
*** Bugger. You mean, fathatan and the other -atan's are not optional like the other vowel points? That makes sense, actually, because I don't know how you'd disambiguate them. Inserted.
**** Um, I don't know -- and I don't really know Arabic. I ''think'' they're just as optional as the other vowel points. However, I would write them if "an" is represented by alif. I don't know what Arabic practice is; I can imagine that the alif by itself is sufficient clue to the reading (and in spoken language, the endings generally get lost anyway, don't they?). IOW, don't take my word as authority here. Oh, and the has.s.atan I just thought should have it for consistency with the others. See next comment about the "fully pointed" bit.
 
** Should the Arabic for "mas-'alatun" use dammatan (representing "un") at the end over the ''ta marbuta'', for consistency with "s.aifun", "haufun", and "buneiyun" in the following examples, both of which use dammatan? (And also "merkebun" and "motun" earlier.) And "sih.run" should probably also have dammatan.
***I'm blindly following, but you did notice the vowel example words are fully pointed, and the rest aren't, right?
 
**** No, I didn't notice. Hence some of my suggestions for consistency's sake are for a false consistency. I noticed some words were pointed and some not but didn't bother to look at the pattern. So I just figured "most words transliterated ''-un'' have dammatan, why not all?". I suppose the answer may be "only this which demonstrate vowel pronunciations have explicit dammatan", which is fair enough. Similar with has.s.atan, which is also not fully pointed and, by that token, should probably not have any -atan.
***All the exx you mention are fully pointed. If the dammatan is not normally used in written Arabic, it doesn't belong in mas-'alatun, because as a consonant example that word is not pointed.
 
***The tanween are not included in normal unpointed Arabic, just like the vowel letters. So out they go again...
** Should there be a space between the Arabic for "mata:r" and the following comma, for readability?
 
*** I don't like typographical tricks. The sensible thing to do instead is to make the comma part of the Arabic text, which it isn't currently.
**** Well, some Arabic words appear (on my screen) to have a space between word and comma and some don't. In some case, the no-space variant has the comma smudged right against the rightmost Arabic letter, which looked a bit ugly.
 
** The sukun in the final example "wufu:d" looks as if it's between the waw and the dal; can it be moved further to the left in order to be firmly over the dal or possibly even a bit to the left of it?
*** I've currently got the sukun modifying the waw; I take it it should be modifying the dal instead (so not wufuw0d, but wufuwd0). Done.
 
**** Oh, was that supposed to be there? *thinks* I suppose it has a point. I'm not sure whether waw or ya take sukun when they serve as long vowel marker; I suppose they might. I'd've written them without any point if they just mark long vowel, but don't take my word on that. I just went by the transliteration (which has no final vowel) to suggested the sukun on the dal.
**** My textbook has sukun on consonantal y,w, and considers the diphthongs to be aw, ay. So sukun on the w and the d.
 
''I obviously had no idea what I was doing; do keep checking on me!'' -- n.
 
** (Note: these nits are from [[jbocre: pne|.filip.]] and apply to the PDF version bearing file date 2002-12-10 15:50 +0100)
** Chinese -- "Nanking", should be "Nan'''n'''ing", underlined as "'''N'''a'''nn'''i'''ng'''"
 
** Chinese -- "nongren", should also underline the ''ng'' and the final ''n''.
** Chinese -- "gongfu", should also underline the ''o'' since it's pronounced /u/ (compare Wade-Giles spelling "kung fu")
 
** Chinese -- "sh�nme", could also underline the first ''e'' if you wish (IMO). (This is also how it's marked on [[jbocre: Pronunciation Guide Putonghua - with character code BIG5]], i.e. both vowels are boldface on that page.)
** Chinese -- "liefeng" has ''lie'' underlined, should only be ''ie''.

Revision as of 17:04, 4 November 2013

  • Nicholas, N. 1996a. ojban as a Machine Translation Interlanguage in the Pacific. Fourth Pacific Rim International Conference on Artificial Intelligence: Workshop on 'Future Issues for Multilingual Text Processing', Cairns, Australia, 27 August 1996. 31-39.
  • pycyn pointed out that work has been performed on conversion form Logician's English to predicate logic. See the rovided references.
    • Jay think that translating Lojban into other languages is almost purely a L text generation problem. ((Jay Kominek ay also feels that translating natural languages into Lojban is an uninteresting problem, FWIW.)
      • An uninteresting problem?? Well, let's get some skeleton code running then if it's so easy! Because it's a very useful project!
        • Come on, xod, we're supposed to be thinking logically, aren't we? le'i ro cinri na cmima le'i frili The kind of reasoning you're demonstrating is the kind of thing Lojban ought to be wonderful at putting a stop to. :) --Jay
          • zo'o .i vu'e ma'i lo'i skami nabmi le sizytolcinri ca'a smuni lo nalpluja
      • What quality of translation is uninteresting? A Babelfish-quality machine translator would be useful; is that what you consider dull? A Stefan George-quality translator would be incredible (and far beyond the state of the art!).
        • Any quality is uninteresting. I don't see taking massive amounts of natlang text and moving it into Lojban as useful. What I do see as useful is writing new things (patents, manuals, etc) in Lojban, and then being able to get quality translations of the Lojban, in n other languages. (Nobody will learn Lojban to read things which are already available in a natural language, but they might learn it to that they can write things that can be translated into n languages.) --Jay
          • Nobody needs to learn Lojban if converting into and out of Lojban is so easy. Sorry, but if Lojban is used as an interlingua, it will be less like a Lingua Franca, spoken by many people to each other, and more like like a hidden inter-translation code that few ever care to see.As far as I am concerned, Natlang --> Lojban is hard, Lojban --> Natlang is easy. So, you can see my surprise at the allegation that the former is easy too! If it's all so easy, let's just do it. --xod
            • You're living in an alternate reality, because nobody has said the that natlang -> lojban is easy, and asserting it more often won't make it true. --jay
              • Ignoring my direct addressing of this point doesn't help matters. Read the sentence I wrote above in Lojban. --xod
                • I did read it. That is a perfectly valid assumption to make, right until you're corrected. When you persist in holding a view which is valid in reference frame A, after it has been pointed out that you're not in reference frame A, well, see the above "alternate realities" comment. --jay
                  • The reference frame is skami nabmi. We are discussing a issue of software complexity. Where is the disconnect, and why do you think it's been explained to me even once? --xod
  • [1] - a 1996 overview of the state of the art in natural language processing

I'm pretty knowledgeable about artificial intelligence, though I've never worried much about natural language understanding specifically. In my opinion, the issue of whether translation to or from Lojban is "easier" is secondary. Before you can answer it, you have to ask: What quality of translation?

  • There are large chunks of the translation process which Lojban makes easier, and the only thing about Lojban which would make the process more difficult is the fact that you can't rely on natural language vagueness and iffiness to carry the day. (And really, you shouldn't let it ever do so, but people want results...) --jay

For machine translation quality similar to the current state of the art--poor--I expect Lojban would be much easier to translate from and somewhat easier to translate to. Lojban provides unambiguous parses and often-unambiguous word meanings, which are basic abilities that today's machine translators have trouble with.

  • "Trouble with"? They're incapable of it. For some languages, its provably impossible. (without true understanding of the context. see Swiss German) --jay
    • "Incapable" is too strong a term. Machine translators can use statistical models to make guesses at word senses. It's not on the same planet as throwing darts. mi'e jezrax
      • I was referring to parsing the grammar, actually. Swiss German is provably context sensitive, so you'll need to understand it before you can even hope to parse it. As far word sense, well, if you've got an algorithmic process for even guessing at the meaning of words in natural language, I suggest you publish. :) Otherwise, you'll have to define for the system every word you want it to be able to translate. (Whereas in Lojban, that is limited to the so-far-little-used fu'ivla) (The best statistical model I've seen which would be applied to determining word sense from nothing is latent semantic analysis, and that requires a very very big corpus, and acts in odd, unpredictable ways: hot and cold are "closer" to it than are cold and cool.)
        • ] The ambiguous-parse problem occurs in, probably, all natural languages. I gather that the most popular way to deal with it is by brute force: produce all possible parses (usually a lot more than you expect), and then rank them. As far as I'm concerned, this is an easy problem from among the problems of natural language understanding--but only relatively speaking!

For machine translation quality similar to a quick-and-dirty human translation--moderate quality but still much better than the current state of the art--I doubt Lojban offers much advantage. The problems are so much more difficult that merely getting the syntax and individual words correct doesn't go that far toward solving them.

  • What problems (that don't already exist in dealing with natural language)? jbofi'e already performs "quick-and-dirty" translation, and the results would be decent with smoothing and some knowledge of the destination language applied (getting subject/verb count agreement and such things to match). --jay
    • jbofi'e's translation quality is way, way worse than a quick-and-dirty human translation.
      • You'll need to define "quick and dirty", then, as I interpret it to mean dictionary lookup of each individual word, limited attempts to deal with conjugation, and simplistic reordering to match the order of subject, verb and object in the destination language. jbofi'e definitely beats that out.
        • A "quick-and-dirty human" translation is, say, one done in real time by a simultaneous translator.

Obviously, the higher-quality the machine translation, the more uses it has. I doubt that a poor-quality translation would be adequate for the proposed patent application, though I could be wrong. Also, the patent application could rely on formalizing the source text according to special-purpose rules, which would make the job easier.

  • A poor translation that takes 2 seconds could be worth something compared to a good translation which would take a week or so and cost you a bit of money. (As far as limiting the domain, see the METEO system used by Canada to do translation of weather reports, works flawlessly.) --jay
    • Of course; it all depends on the use.

mi'e jezrax