BPFK: Old Morphology

From Lojban
Jump to navigation Jump to search

 forum_name: Morphology
topic_title: valfendi tarball corrected
    subject: valfendi tarball corrected
   username: phma
  post_time: 2003-03-28 12:42:35
  post_text:

The current version of the morphology algorithm in valfendi 0.1.1 is 5.1, but I published a tarball with the wrong version number. If you have valfendi 0.1.1 and the algorithm claims to be version 5.0, please redownload from http://phma.hn.org/Language/valfendi.html .


 forum_name: Morphology
topic_title: Pending morphological problems
    subject: Pending morphological problems
   username: nessus
  post_time: 2003-05-15 09:58:18
  post_text:

Here is a list of collected problems or ambiguities waiting to be solved. This list is by no means complete: feel free to add items or better to propose solutions for existing ones  :)

Uses of the comma: the Book saying that two lojban words may not differ only because of commas is at the very least inaccurate. Should we restrict commas use only between vowels? are they useful otherwise?

Pronounciation of comma: what is the recommended one? is 'h' (') really an option when it could break the audio-isomorphism?

Stress on 'y': is it allowed? If yes, in which cases? The Book says it is not stressed in "normal lojban context". While zoi quote are certainly not normal lojban context, are cmenes? cmavos?

Stress on cmavo: should we relax the rule that says the last cmavo preceding a brivla should not be stressed? Is that useful even when no ambiguity arises?

Cmavo .y. : is it always between two pauses? What about ybu? The Book says it is a valid compound cmavo? Should it be y.bu?

Experimental cmavos: what is allowed in the vowels string? only V and VV as defined by the Book? But then ku'a'e will not be valid? Are others VV (including y for instance) allowed?

Fu'ivla: can it have the format of a (non allocated) gismu?

.... to be completed!


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject: cmene with la/lai/doi
   username: clsn
  post_time: 2003-06-03 23:40:02
  post_text:

It's likely that there's zilch to be said or done about this, but it might as well be mentioned.

Nobody, nobody actually keeps the rule about no la/lai/doi in cmene straight, certainly not always. I've caught some of the top Lojbanists (you know who you are) screwing it up after years of Lojbanizing.

See http://www.lojban.org/wiki/index.php/if%20the%20%22no%20la/lai/doi%20in%20cmene%22%20rule%20didn%27t%20exist for discussion there.

I think it would be much easier just to teach that la/lai/doi (the words) all "end in a glottal stop" (I mean teach people to pretend that's true, not that it really is) and remove the restriction from cmene construction. I really doubt we'll be able to do it, but hey, might as well get it in the open.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: xod
  post_time: 2003-06-04 11:54:45
  post_text:

The proposal is to demand the glottal stop before and after every cmene, and relax the forbiddance of {la, lai, doi} inside cmene. What happens with glottal stops that are inside cmene?

It breaks existing usage, but eliminates arbitrariness and makes the language much easier to learn and actually speak.

Are there any objections?


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: arj
  post_time: 2003-06-04 12:20:08
  post_text:

I'm ambivalent.

It would make Lojban easier to learn - I for one wouldn't hesitate to say {do selcme zoxod.} instead of {do selcme zo .xod}, and I have been doing Lojban for nigh on six years.

But the downside of this is that it is going to make the language more difficult to speak for the vast majority of the uses of cmene. Imagine having to say things like: "Oh, hello - John! Did you see the cartoon that - Kate - put on her office door yesterday?"

I don't think I'll veto this, but I'm not enthusiastic about it, either.

Maybe we should dispense with obligatory pauses altogether? The coming Lojban speech recognizers will probably have to do statistical processing to disambiguate pauses from the onsets of voicelesss plosives, anyway. But that's almost as sacred as the gismu forms, and off topic in this forum anyway. I'll stop now.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: xod
  post_time: 2003-06-04 12:24:22
  post_text:

Deutsch has obligatory pauses all throughout it. "An orange" is "einy.oranj".


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-06-04 14:12:39
  post_text:

What happens with glottal stops that are inside cmene?

Technically, a cmene-word (cmevla) cannot have a pause within it. But a name (cmene) may consist of more than one cmevla. That is, it is perfectly reasonable to talk of la mark. clsn. or la bil. klintn. so the presence of a pause in mid-name causes no problem, so long as the words around it are valid cmevla (i.e., *la .andi. kaufman. would not be valid, since .andi. doesn't end on a consonant. But .andikaufman. would be okay). It would also mean that you have to quote it with lo'u/le'u instead of zo (yes, you could use lu/li'u but I think people should get out of the habit of reaching for those whenever they need a multiword Lojban quote, since they require that the text inside be grammatical).

But the downside of this is that it is going to make the language more difficult to speak, for the vast majority of the uses of cmene. Imagine having to say things like: "Oh, hello - John! Did you see the cartoon that - Kate - put on her office door yesterday?"

Bah. We already have to remember it in many cases. For example, your example

I for one wouldn't hesitate to say {do selcme zoxod.} instead of {do selcme zo .xod}

is wrong as it stands: even with the current rules you must say do selcme zo .xod. and the pauses are both mandatory. The rule only saves you from having to pause after those three magic words: la/lai/doi. With a little cleverness you can often use those to help out in other situations, but not everywhere, and certainly people already don't use them even when they could. Do you always remember to say mi'e .xod.? (I had to use xod's name, because .arj has an easy-to-remember pause in it, as it begins with a vowel) If we're serious about keeping this kind of unambiguity, we might as well bite the bullet and teach people to do it, and that means that they should learn to pause after all COI (for example). They might as well get used to those pauses, they're going to be there, being a pain, whether or not we make this change.

And of course there are "internal" ways of making this less intrusive. If you don't imagine them as pauses, but pretend that "la. happens to end on a glottal stop" (speaking Klingon helps), it becomes more natural.

I'm not positive I'm "enthusiastic" about this one either, even me; but you can't deny that the current system isn't working.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: xod
  post_time: 2003-06-04 14:23:22
  post_text:

I await further substantive criticisms, but to me, this seems like a trial balloon detecting whether the BPFK is able agree on actual changes that are good, or whether there is enough knee-jerk conservativism here to totally constipate all activity.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: rlpowell
  post_time: 2003-06-04 15:55:31
  post_text:

I await further substantive criticisms, but to me, this seems like a trial balloon detecting whether the BPFK is able agree on actual changes that are good, or whether there is enough knee-jerk conservativism here to totally constipate all activity.

Assuming that there *are* no substantial objections, I'm inclined to agree.

-Robin


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-06-04 16:57:16
  post_text:

I am completely in favour. It makes it a hell of a lot easier to avoid making "mistakes", at the comparatively trivial cost of learning {la}, {lai}, {doi} as /la./, /lai./, /doi/ = (la?), (lai?), (doi?). Plus lots of people dislike having names screwed up by the no-la-in-cmevla rule. I just thank Jeeg my name's not Larry!


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-06-04 17:30:20
  post_text:

Imagine having to say things like: "Oh, hello - John! Did you see the cartoon that - Kate - put on her office door yesterday?"

Imagine you speak any of most varieties of British English. Then you just have to say "Oh, hellot John".

Maybe we should dispense with obligatory pauses altogether? The coming Lojban speech recognizers will probably have to do statistical processing to disambiguate pauses from the onsets of voicelesss plosives, anyway. But that's almost as sacred as the gismu forms, and off topic in this forum anyway. I'll stop now.

It's a relevant point, because it's perfectly true. The rules for word-segmentation have no realworld effect on segmentation. They're there just to make the design unambiguous (which is a good thing). But the no-la-in-cmene rule does have real consequences, of course. So mark's proposal has a real upside but no real downside (except in as much as it requires significant changes to the published prescription).


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: nitcion
  post_time: 2003-06-05 02:51:55
  post_text:

I await further substantive criticisms, but to me, this seems like a trial balloon detecting whether the BPFK is able agree on actual changes that are good, or whether there is enough knee-jerk conservativism here to totally constipate all activity.

This is, as you know, tough.

On the one hand, I am here to preserve backward compatibility, and the existing baseline. That is my job as BPFKJ.

On the other hand, I (not as BPFKJ, but as voting commissioner) am willing to compromise how strongly I defend baseline. I am inclined to defend it less strongly if a proposal:

does not substantially invalidate existing text (as would happen with a reassignment of Y in the grammar), if it involves cmavo that noone has learned (MEX?) or used (lau), if it is poorly documented in the current baseline (fa'a, gadri), if usage has overwhelmingly gone the other way for reasons consistent with the design principles rather than laziness/malglico (vo'a, fa'a), or if it has proved clearly unlearnable (little usage) *and not well motivated by logic*. (cmene?) And if that proposal brings a clear benefit.

And a case can be made for that here. Not blindingly obvious, but I am willing to hear discussion.

I think this is distinct from optimisations ('tinkering') where the benefit is comparatively small with regard to the design principles in particular, the current pattern is not demonstrably difficult to learn or use, and there has already been substantial usage. Thus (sorry, Jorge, but you would expect this) I think Jorge's suggestion to reassign members of COI to UI is unacceptable. (I know he was just saying; but I think my job is going to start being saying what is in or out of bounds; I have no desire to shut reformers down, but I agree with Jay that there are limits to what can even be entertained in discussion.)

On the third hand, Bob is threatening blanket vetos over this kind of thing; and the conservative constituency has reasons to be concerned that this inaugurs open season. If what I've just said contravenes the published guidelines, then I am out of line and should be whacked. If people don't mind that kind of flexibility, and it is outside the guidelines, I now think we should have a 2/3 absolute majority poll before I change them.

What I'd like is stats on who is remembering the initial pauses; and proof that an alternative codification would not compromise the audiovisual isomorphism.

Even then it might not succeed; but I personally doubt Lojbanists are going to follow this rule in any but the most anal-retentive of contexts --- especially in speech. It may end up staying in the standard as a dead letter, rather than compromise on-paper audiovisual isomorphism, at least.

But (as a general statement), I refer to John's qualification on board that conservatism in itself is not an adequate defense, for either baseline or usage. *If* an alternative is produced that preserves audiovisual blahblah (design criterion of Lojban), *and* is consistent with usage (near obligatory final pause at least in writing, but little if any initially), then a conservative has to present to us reasons why the existing system is better. Easier to learn is a point for; the fact that people aren't using it normally wouldn't be a point against, but this isn't logically abstruse gadri, this is everyday names.

The issue of initial pause is distinct from the issue of nested la/do/doi (which has already been relaxed by allowing la/do/doi in clusters, and which is a rule people do know even if they violate it, unlike apparently the initial pause; so the existing knowledge of the language in the community prejudices against such a change, as Mark recognises). They should be considered and argued on separately.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-06-26 17:34:26
  post_text:

On the third hand, Bob is threatening blanket vetos over this kind of thing; and the conservative constituency has reasons to be concerned that this inaugurs open season. If what I've just said contravenes the published guidelines, then I am out of line and should be whacked. If people don't mind that kind of flexibility, and it is outside the guidelines, I now think we should have a 2/3 absolute majority poll before I change them.

Whatever I write here is of course subject to hearing out Bob's objections when he can formulate them. I'm writing now because now is when I'm thinking about it.

What I'd like is stats on who is remembering the initial pauses; and proof that an alternative codification would not compromise the audiovisual isomorphism.

.....

The issue of initial pause is distinct from the issue of nested la/do/doi (which has already been relaxed by allowing la/do/doi in clusters, and which is a rule people do know even if they violate it, unlike apparently the initial pause; so the existing knowledge of the language in the community prejudices against such a change, as Mark recognises). They should be considered and argued on separately.

Stats may be misleading. There are very few "illegal" cmene that survived to major works, but there are scads that have been shot down. My point is that allowing la/lai/doi is in line with usage, and in fact the baseline is not.

I'm not sure I see that the issue of the initial pause is distinct from nested la/lai/doi. The only reason initial pause becomes necessary after la/lai/doi is if the cmene itself may contain those syllables. Past usage probably has lots of violation of the initial pause after COI and everything else that can precede a cmene, but removing that requirement would make a mess out of autosegmentation. In fact, I think most people don't even properly put pauses in front of vowel-initial words, especially in constructions like .ua.ui which usually comes out as /wawi/ (I invoke the glottis, and I say /?wa?wi/, yes, with an initial glottal stop even if there's a pause before it). We can't do much about these: making those pauses no longer required would mess up our algorithm. Best we can do is teach people better and hope.

But nested la/lai/doi is a mistake that is still happening (I caught tsali on it recently on IRC) and is a much more "permanent" problem than missing pauses. A missing pause happens in speech. You say it (or don't) and then it's gone, and either it caused confusion or it didn't, but it's past. Very rarely are conversations recorded, so for the most part speech is evanescent. Missing dots in written versions are no problem, since the dots are optional anyway. But la/lai/doi shows up in written documents, and those last and are disseminated. It gets into word-lists of cmene for countries or celebrities or ice-cream flavors, and people start using them. This is a case where usage forms an incorrect example and it can and often does *stick*. Usage has pretty much voted against forbidding la/lai/doi; we might as well own up to it.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: noras
  post_time: 2003-07-17 22:32:19
  post_text:

No la/lai/doi in names is the speech-stream-resolution equivalent of "cu before selbri after a 'le broda' sumti". Each is needed for unique resolution in it's sphere. I no more want to give up speech-stream unique resolvability than I wish to give up grammar unique resolvability.

If I have to come up with a compromise, I'll just suggest reworking it altogether such that the sound of "th" is allowed only at the beginning of a name, and is required there. Then a name is everything from the "th" through to the trailing pause.

"th-noras"


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: kd
  post_time: 2003-07-17 23:18:55
  post_text:

1. The "No la/lai/doi" rule is completely unnecessary if all names begin with pauses. Which, since except after those three cmavo they already do, is not at all an unreasonable change to make. It is an easy rule to remove.

2. This th idea is very silly. Even if you don't want to admit it, th is already a legal sound, and it is impermissible except between vowels. So it can't come at the beginnings of names, and deleting a version of a current phoneme requires some people (at least one...) to relearn their pronunciation. I will vote against anything that changes what sounds are allophonic to what other sounds.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-07-18 00:16:46
  post_text:

No la/lai/doi in names is the speech-stream-resolution equivalent of "cu before selbri after a 'le broda' sumti". Each is needed for unique resolution in it's sphere. I no more want to give up speech-stream unique resolvability than I wish to give up grammar unique resolvability.

No, it's not the same as cu. I think it is qualitatively different, and a much more subtle and pervasive annoying rule, but even aside from my opinion, the long and short of it is that people with experience screw up with forgetting cu pretty infrequently, but forgetting the "la" rule fairly regularly--counting only times when the situation arises, that is (i.e. cmenifying names that had one of those syllables in the source language). cu has proven itself to be learnable; the "la" rule has not.

If I have to come up with a compromise, I'll just suggest reworking it altogether such that the sound of "th" is allowed only at the beginning of a name, and is required there. Then a name is everything from the "th" through to the trailing pause.

This is thoroughly ridiculous. Your "compromise" is to introduce another sound that all cmene must start with? My "compromise" is very similar: all cmene must start with a pause. In what way would starting cmene with "th" be more helpful/less disruptive than starting them with a pause, which is already (a) permissible and (b) required nearly all the time anyway?


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-07-18 05:28:08
  post_text:

No la/lai/doi in names is the speech-stream-resolution equivalent of "cu before selbri after a 'le broda' sumti". Each is needed for unique resolution in it's sphere.

I had thought that "cu" is not strictly necessary, and is primarily an abbreviatory device (bcs it allows lots of other terminators to be omitted).

If I have to come up with a compromise, I'll just suggest reworking it altogether such that the sound of "th" is allowed only at the beginning of a name, and is required there. Then a name is everything from the "th" through to the trailing pause.

"th-noras"

I think we pretty much all agree with this compromise, except for its details. "th" = Lojban /'/, so it would be a radical change to allow /'/ only at the beginning of a name. But /./ is required at the beginning of names, so if you change "th" to "glottal stop", then we are fully in agreement.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: noras
  post_time: 2003-07-22 20:01:48
  post_text:

This is thoroughly ridiculous. Your "compromise" is to introduce another sound that all cmene must start with? My "compromise" is very similar: all cmene must start with a pause. In what way would starting cmene with "th" be more helpful/less disruptive than starting them with a pause, which is already (a) permissible and (b) required nearly all the time anyway?

u'usai. The "th-noras" suggestion was meant to be reductio ad absurdam; I didn't really mean it. Let me try to explain this a bit better.

When someone says "le prenu klama" when they really mean "le prenu cu klama", they have made a mistake. It may be a fairly common mistake, but it is still a mistake. We don't talk about getting rid of the need for "cu" (or other closer of the leading sumti place) just because people persist in this error.

Similarly, I don't believe that we should get rid of the "no la/lai/doi" rule just because people haven't been able to get it right.

I'll elaborate another time, when I finally get my mission statement made, but here's a quick rationale: When a name (for something/someone) is chosen, the choser has plenty of time to come up with something acceptable; he/she can even check it using a program for that purpose. It should not, in the long term, turn out to be a big deal.

If, however, someone wants to chose a name with la/lai/doi (or mistakenly makes one and others need to use it), there is a mechanism provided: la'o. In fact, we have also allowed that, if they just put a consonant (any consonant - not just "th" zo'o) before the la/lai/doi, it's OK.

I think lojban speech is choppy enough with the required pause after the name that I don't want to add another required pause before it.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-07-23 06:08:02
  post_text:

I think lojban speech is choppy enough with the required pause after the name that I don't want to add another required pause before it.

There is no such thing in Lojban as a "required pause". There is a required /./. One of the ways in which /./ may be realized phonetically is by a phonetic consonant, (?). Another way is by a pause. If you dislike the choppiness of pauses, then just pronounce /./ consonantally.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-07-23 06:11:23
  post_text:

When someone says "le prenu klama" when they really mean "le prenu cu klama", they have made a mistake. It may be a fairly common mistake, but it is still a mistake. We don't talk about getting rid of the need for "cu" (or other closer of the leading sumti place) just because people persist in this error.

If there was an obvious remedy to the cu problem, and if the current cu rule were generally ill-motivated and more hindrance than help, then we *would* be talking about getting rid of "cu".


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: noras
  post_time: 2003-07-23 10:08:03
  post_text:

When someone says "le prenu klama" when they really mean "le prenu cu klama", they have made a mistake. It may be a fairly common mistake, but it is still a mistake. We don't talk about getting rid of the need for "cu" (or other closer of the leading sumti place) just because people persist in this error.

If there was an obvious remedy to the cu problem, and if the current cu rule were generally ill-motivated and more hindrance than help, then we *would* be talking about getting rid of "cu".

There are two things people have trouble with in names (I decline to call them "problems" at this point). One is the "no la/lai/doi", and the other is the required pause before when there is no "la/lai/doi" preceding. It's a personal opinion as to which is worse. I choose to support the status quo.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-07-23 10:32:27
  post_text:

When someone says "le prenu klama" when they really mean "le prenu cu klama", they have made a mistake. It may be a fairly common mistake, but it is still a mistake. We don't talk about getting rid of the need for "cu" (or other closer of the leading sumti place) just because people persist in this error.

If there was an obvious remedy to the cu problem, and if the current cu rule were generally ill-motivated and more hindrance than help, then we *would* be talking about getting rid of "cu".

Actually, if you want to take this parallel, the proposal at hand is equivalent to requiring the cu in all cases, and not just when it is really necessary.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-07-23 10:59:01
  post_text:

When someone says "le prenu klama" when they really mean "le prenu cu klama", they have made a mistake. It may be a fairly common mistake, but it is still a mistake. We don't talk about getting rid of the need for "cu" (or other closer of the leading sumti place) just because people persist in this error.

If there was an obvious remedy to the cu problem, and if the current cu rule were generally ill-motivated and more hindrance than help, then we *would* be talking about getting rid of "cu".

There are two things people have trouble with in names (I decline to call them "problems" at this point). One is the "no la/lai/doi", and the other is the required pause before when there is no "la/lai/doi" preceding. It's a personal opinion as to which is worse. I choose to support the status quo.

I.e. preserve two 'problems' instead of reducing them to one.

The 'problem' of forgetting the "required pause" is comparatively trivial, firstly because it occurs at the inherently grungey phonetic level, where things are already going to be rather messed up (e.g. I do not believe Lojbanists consistently distinguish, say /nc/ from /ntc/), and secondly because it can rather easily be remedied by learning {la} as /la./ (la?), and so on for other words that take cmevla complements, as Mark has been suggesting for more years than I can remember.

I can understand supporting the status quo because it is the status quo, but I can't understand supporting it as better than the alternatives when on a level playing field.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: noras
  post_time: 2003-07-23 12:33:33
  post_text:

The 'problem' of forgetting the "required pause" is comparatively trivial, firstly because it occurs at the inherently grungey phonetic level, where things are already going to be rather messed up (e.g. I do not believe Lojbanists consistently distinguish, say /nc/ from /ntc/), and secondly because it can rather easily be remedied by learning {la} as /la./ (la?),

and so on for other words that take cmevla complements, as Mark has been suggesting for more years than I can remember.

I can understand supporting the status quo because it is the status quo, but I can't understand supporting it as better than the alternatives when on a level playing field.

That is why /ntc/ is disallowed as a medial (CLL page 37). And other things are disallowed in names besides la/lai/doi. /ntc/, as an invalid triplet, is also disallowed. So is /mz/, as people found out trying to do James as /djeimz/; they had to make it instead /djeimyz/. Yes there are a number of things that have to change from the original when making a name. My preference comes partly from personal experience. Yes, I have used "la" mistakenly in a name. I've also forgotten pauses. I've found the latter harder to extinguish.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-07-24 15:54:46
  post_text:

Yes, I have used "la" mistakenly in a name. I've also forgotten pauses. I've found the latter harder to extinguish.

I can see how that could be, and different people will differ on what's harder and what isn't. But the situation isn't completely symmetric.

If you mess up on a pause--and nobody got confused--then in a matter of seconds the problem is history and no longer a problem, but something that happened. This presupposes that you're not being recorded, but the vast majority of speech isn't recorded, and speech that is specifically recorded for posterity is usually spoken with more care. But in general, the problem is fleeting, if it's there at all.

If you use "la" in a name by mistake--and nobody catches it--well, the implications could be worse. Say what you will that people will be meticulous and double-check their cmene when writing; the record is against that. I've caught bad cmeme in formal translation projects by skilled Lojbananas, not just in off-the-cuff letters. These things get picked up and used; it's very frequent that nobody catches them. They make their way into speech, they make their way into writing. In fact, they are often written down right at the outset. This is a problem that affects the written language, which has much more lasting effects than the spoken language. Then someday it comes before something that really needs the autosegmentation (some voice-recognition system) and all of a sudden, "oops, the word we've been using all along is wrong." (Yes, thiis could happen on the other side too, "oops, the way I've been saying laLOJban all along is wrong." But the pause rule needs to be learnt once and applied uniformly everywhere. The la/lai/doi rule has to be individually checked for each cmene).

Basically, pauses are ephemeral, and mistakes with them are similarly short-lived, except in the (poor) habits learned by the speaker and some unusual situations. Syllables are written (the period for pause is always optional), and mistakes with them can linger for much longer.


 forum_name: Morphology
topic_title: A compromise solution?
    subject: A compromise solution?
   username: And
  post_time: 2003-07-24 17:18:41
  post_text:

I wonder whether the following compromise would work:

la/lai/doi are permitted in cmevla iff the cmevla is preceded by /./. The /./ is omissible iff the cmevla does not contain la/lai/doi.


 forum_name: Morphology
topic_title: A compromise solution?
    subject: Re: A compromise solution?
   username: xorxes
  post_time: 2003-07-24 17:43:24
  post_text:

I wonder whether the following compromise would work:

la/lai/doi are permitted in cmevla iff the cmevla is preceded by /./.

The /./ is omissible iff the cmevla does not contain la/lai/doi.

How would {.lanoras} parse?


 forum_name: Morphology
topic_title: A compromise solution?
    subject: Re: A compromise solution?
   username: clsn
  post_time: 2003-07-24 18:08:52
  post_text:

I wonder whether the following compromise would work:

la/lai/doi are permitted in cmevla iff the cmevla is preceded by /./.

The /./ is omissible iff the cmevla does not contain la/lai/doi.

It's been suggested, several times. It doesn't work. The two criteria are mutually exclusive. To find the ends of a cmene, currently, you start at the pause and work backwards until EITHER a pause OR la/lai/doi. If a cmene can contain a la/lai/doi, you can't stop when you hit one of those syllables.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject: Back from LogFest
   username: clsn
  post_time: 2003-07-28 00:35:05
  post_text:

Well, I'm back from LogFest, with some new datapoints and opinions.

Within 30 seconds of opening the brand-spanking new '.i la lojban. mo.' book I found at LogFest (my copy of which, like my camera, apparently got left behind--arggh), guess what I found? Right there, plain as day on page iii (in the only true paragraph on the page, end of I think the second line), an invalid cmene, invoking *la stivn. laitl. as the recipient of the authors' thanks. Note that this book was proofread by Nick, John, and Robin, all of whom are cautious and experienced Lojbanists. And yet it still made it into this seminal text. Can we say that this rule is just Not Working?

No la/lai/doi in names is the speech-stream-resolution equivalent of "cu before selbri after a 'le broda' sumti". Each is needed for unique resolution in it's sphere. I no more want to give up speech-stream unique resolvability than I wish to give up grammar unique resolvability.

When someone says "le prenu klama" when they really mean "le prenu cu klama", they have made a mistake. It may be a fairly common mistake, but it is still a mistake. We don't talk about getting rid of the need for "cu" (or other closer of the leading sumti place) just because people persist in this error.

If you want to discuss whether or not cu should be made mandatory, that's a discussion for another forum (probably Cmavo Miscellaneous). Here we're working on the la/lai/doi rule in cmene. I'm not going to defend or attack cu here.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject: Re: Back from LogFest
   username: nitcion
  post_time: 2003-07-30 12:04:30
  post_text:

guess what I found? Right there, plain as day on page iii (in the only true paragraph on the page, end of I think the second line), an invalid cmene, invoking *la stivn. laitl. as the recipient of the authors' thanks. Note that this book was proofread by Nick, John, and Robin, all of whom are cautious and experienced Lojbanists. And yet it still made it into this seminal text. Can we say that this rule is just Not Working?

Good Lord. I couldn't have planned it any better.

FWIW, my position is no demi-veto, but I agree with Mark that pause is the lesser of two evils.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: phma
  post_time: 2003-08-04 11:17:12
  post_text:

I am unwilling to allow a rule change that requires that the lexer for standard Lojban and the lexer for rule-changed Lojban be different.

Consider these phoneme strings: /lablAbruk/, /lalaus/, /la.bElarus/. According to standard Lojban AFAIU, these lex as {la blabruk}, error, and {la be la rus}. Before the rule change allowing consonants before la/la'i/lai/doi, /lablAbruk/ was an error. According to the ala'um rule, which I favor, /lalaus/ lexes as {la laus}. The lexer does the same for all three variations: {la blabruk}, {la laus}, {la be la rus}. Only the validation is different, calling {laus} invalid if the ala'um rule is not specified.

The new proposed rule would lex them as {lablabruk}, {lalaus}, and {la bElarus}. No lexer can be written that does this and differs only in validation from one that lexes standard Lojban. The new proposed rule therefore cannot be allowed.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-08-04 11:42:48
  post_text:

I am unwilling to allow a rule change that requires that the lexer for standard Lojban and the lexer for rule-changed Lojban be different.

Consider these phoneme strings: /lablAbruk/, /lalaus/, /la.bElarus/. According to standard Lojban AFAIU, these lex as {la blabruk}, error, and {la be la rus}. Before the rule change allowing consonants before la/la'i/lai/doi, /lablAbruk/ was an error. According to the ala'um rule, which I favor, /lalaus/ lexes as {la laus}. The lexer does the same for all three variations: {la blabruk}, {la laus}, {la be la rus}. Only the validation is different, calling {laus} invalid if the ala'um rule is not specified.

The new proposed rule would lex them as {lablabruk}, {lalaus}, and {la bElarus}. No lexer can be written that does this and differs only in validation from one that lexes standard Lojban. The new proposed rule therefore cannot be allowed.

First, you should probably define the "ala'um rule"; I had to look that up.

Now, as I see it, the "ala'um" convention permits la/lai/doi in cmene provided that they are followed by a non-consonant (i.e. vowel or '). Seems much like the rule-change that permitted la/lai/doi provided they were preceded by a consonant (instituted after lat. was used in addressing a cat in the JL comic strip, found to be illegal, and substituted with mlat. plus the new rule). That's well and good, but it doesn't solve the right problem.

It doesn't fix the fact that after years of using Lojban, people still use la *laivdjurnal (or something close) and la *stivn.laitl makes it into a heavily-proofread book.

I would claim that permitting la/lai/doi is actually more in line with "standard Lojban," because that's how Lojban has been used pretty much since time immemorial, attempts to enforce the standard rule notwithstanding. The permissive rule is less likely to complain on legacy text, and since pretty much all of the legacy stuff we have is written, it would not even conflict with the old parser, since we always seem to write the space after la, and writing the pause is of course optional. If the rule is enacted, then it will be the standard, and so there would be no disagreement with "Standard Lojban" in future recordings/voice-recognition either.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-08-04 11:46:19
  post_text:

I am unwilling to allow a rule change that requires that the lexer for standard Lojban and the lexer for rule-changed Lojban be different. The new proposed rule would lex them as {lablabruk}, {lalaus}, and {la bElarus}. No lexer can be written that does this and differs only in validation from one that lexes standard Lojban. The new proposed rule therefore cannot be allowed.

Since we are engaged in the task of defining what standard Lojban is, I don't understand your argument. (Is it that the effort put into devising the lexer for the old rule would go to waste? But you don't say that.)


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: phma
  post_time: 2003-08-04 19:38:12
  post_text:

Since we are engaged in the task of defining what standard Lojban is, I don't understand your argument. (Is it that the effort put into devising the lexer for the old rule would go to waste? But you don't say that.)

We have the power to resolve contradictions and inconsistencies. The ala'um rule arose because I interpreted the Book's morphology rule differently than someone else, thus the Book has an inconsistency, or at least an inclarity.

We also have power to change the baseline to match a pattern of usage. But we are told to prefer a position close to the status quo. A change to the morphology that requires a different word-break algorithm is too far from the status quo for me to prefer it.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-08-08 15:35:54
  post_text:

We have the power to resolve contradictions and inconsistencies. The ala'um rule arose because I interpreted the Book's morphology rule differently than someone else, thus the Book has an inconsistency, or at least an inclarity.

The ala'um rule is precisely analogous to the rule permitting la/lai/doi when immediately preceded by a consonant: "It can't mean anything else, so we might as well permit it." And that's reasonable enough. I recall once Nick trying to weasel out of naming *la pontius. pilatus by using ?pila'atus instead. And technically that wasn't, at the time, legal, though of course it would be with the ala'um rule, since no other parse makes sense. But it makes the rule, which we've already seen is too complicated (or too much of a pain) to remember, even more complicated. For all that it permits more prospective cmene, a rule saying "A cmene may not have the syllables la/lai/doi, unless that syllable is preceded by a consonant or followed by a non-consonant" is a lot more compex than one saying "A cmene may not have the syllables la/lai/doi." At best, using the more complex rule will save some situations where the writer forgot to check his cmene and luckily used one that had an "allowable" la/lai/doi; it won't save anything a priori as we have seen with the simpler rule.

We also have power to change the baseline to match a pattern of usage. But we are told to prefer a position close to the status quo. A change to the morphology that requires a different word-break algorithm is too far from the status quo for me to prefer it.

Then you're going to need an even harder-to-define word-break algorithm, since you're going to have to deal with cmene that are bounded neither by la/lai/doi nor by pause. Because the status quo is that people transliterate names, usually remembering to fix impermissible medials, and nearly always remembering to end on a consonant, and leave the nature of the syllables alone. That is the status quo. The status quo is that cmene are not delimited at the beginning at all. Yes, the status quo states that they should be, but they aren't. I don't think there is another feature of Lojban which is as commonly-invoked as cmenifying and yet still subject to as much error after years of skilled work in the language (Nick's been doing Lojban for what, 10-15 years now?). If we want to preserve the status quo and fix it so that it is workable, we need to make the status quo work, and slapping on an even more complex rule won't do it.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: JohnCowan
  post_time: 2003-08-11 00:01:13
  post_text:

I propose the following compromise: divide cmevla into (1) those which contain the syllables la, loi, or doi not preceded by a consonant, and (2) those which don't. Those which do must always be preceded by pause; those which don't, retain the status quo -- preceded by pause except after the cmavo la, loi, and doi.

In this way, both kinds of users are satisfied: those who fear they will have to pronounce too many pauses can avoid type 1 names altogether; those who fear that they will issue illegal names can pause before every name.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-08-11 00:18:15
  post_text:

I propose the following compromise: divide cmevla into (1) those which contain the syllables la, loi, or doi not preceded by a consonant, and (2) those which don't. Those which do must always be preceded by pause; those which don't, retain the status quo -- preceded by pause except after the cmavo la, loi, and doi. In this way, both kinds of users are satisfied: those who fear they will have to pronounce too many pauses can avoid type 1 names altogether; those who fear that they will issue illegal names can pause before every name.

How many times do we have to go through this? Consider the string .lanoras.. Is that the name noras. with a la and no pause before it, or the name lanoras with its obligatory pause? There's no knowing, and context can't always save you. This was suggested before:

I wonder whether the following compromise would work:

la/lai/doi are permitted in cmevla iff the cmevla is preceded by /./.

The /./ is omissible iff the cmevla does not contain la/lai/doi.

This under the heading "A compromise solution?"

We've also seen it suggested when this notion was brought up first, in http://www.lojban.org/wiki/index.php/if%20the%20%22no%20la/lai/doi%20in%20cmene%22%20rule%20didn%27t%20exist The only way out I can see is if you forbid pauses after la (et al) before cmene that lack it, and require a pause before the gadri. And that's even worse than what we have now.

That compromise won't work, unless you have a clever idea I don't yet see. Just leave it.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: xod
  post_time: 2003-08-11 00:31:00
  post_text:

Any so-called compromise will weaken the simplicity of the suggested revision beyond the level of suffering with the status-quo. This suggestion is so simple and elegant, the case so clear, the contrary arguments so baseless, the destructive impact so minimal, and the issue so descriptive of usage that it should be held up as an open-and-shut model case for the BF.

Can we take this to vote now? I want to know if this BF is going to operate, or if it's inert like jboske writ large. Or do we have to wait around several years until the BF "finishes" with all the cmavo, which, at this rate I put around 2007.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: clsn
  post_time: 2003-08-11 00:45:53
  post_text:

In this way, both kinds of users are satisfied: those who fear they will have to pronounce too many pauses can avoid type 1 names altogether; those who fear that they will issue illegal names can pause before every name.

Anyway, if "those who fear they will have to pronounce too many pauses" really could "avoid type 1 names altogether", we wouldn't be in this situation.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject: .ala'um. rule: illegal
   username: clsn
  post_time: 2003-08-12 10:18:42
  post_text:

Something occurred to me this morning. The ".ala'um. rule", as I understand it, permits la/lai/doi in cmene provided they are followed by a non-consonant (generally '). That's a problem.

There is the little-known member of selma'o LA: la'i. If the .ala'um. rule permits a cmene like bala'iman., then we can't tell where cmene begin, since that could be ba la'i man. (OK, maybe stress it baLA'iman if you need to). And if you say "well, we'll say the .ala'um. rule permits ' after la only if it isn't followed by i," then hearty guffaws of ridicule are too good for you. The rule's too complex as it is, extra complexity is a bad idea, and such incredibly nitpicky exceptions are even worse.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject: Re: .ala'um. rule: illegal
   username: And
  post_time: 2003-08-12 20:13:45
  post_text:

Something occurred to me this morning. The ".ala'um. rule", as I understand it, permits la/lai/doi in cmene provided they are followed by a non-consonant (generally '). That's a problem. There is the little-known member of selma'o LA: la'i. If the .ala'um. rule permits a cmene like bala'iman., then we can't tell where cmene begin, since that could be ba la'i man. (OK, maybe stress it baLA'iman if you need to). And if you say "well, we'll say the .ala'um. rule permits ' after la only if it isn't followed by i," then hearty guffaws of ridicule are too good for you. The rule's too complex as it is, extra complexity is a bad idea, and such incredibly nitpicky exceptions are even worse.

As I understand it (& we know I don't understand it very well), bala'iman. would parse as a single cmevla under the ala'um rule, while bala'i.man would parse as ba la'i man. This is presuming that la'i is not one of those words after which the pre-cmevla glottal stop can be omitted.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject: Re: .ala'um. rule: illegal
   username: phma
  post_time: 2003-08-12 23:12:09
  post_text:

As I understand it (& we know I don't understand it very well), bala'iman. would parse as a single cmevla under the ala'um rule, while bala'i.man would parse as ba la'i man. This is presuming that la'i is not one of those words after which the pre-cmevla glottal stop can be omitted.

"bala'iman" breaks up. "bala'iaman" and "bala'i'uman" and "bala'uman" do not.

"la'i" is not allowed in cmene any more than "la" or "lai" or "doi". As "la" is a substring of both "lai" and "la'i", the only way I could explain why "lai" was specified as well as "la" is that "la" is allowed when followed by something other than a consonant. That makes "laus" a valid cmene. "lais" does not contain "la" in this way, but does contain "lai", so it's not a valid cmene. Similarly "la'is" is invalid because it contains "la'i".

The relevant sentence from the Book:

Names are not permitted to have the sequences ``la, ``lai, or ``doi embedded in them, unless the sequence is immediately preceded by a consonant. These minor restrictions are due to the fact that all Lojban cmene embedded in a speech stream will be preceded by one of these words or by a pause.

Since the sequence "lai" contains the sequence "la", either it is redundant, or the criterion is misstated. A cmene can be preceded by "la'i" without a pause, so the second sentence is in error.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject: Re: .ala'um. rule: illegal
   username: clsn
  post_time: 2003-08-14 14:46:02
  post_text:

As I understand it (& we know I don't understand it very well), bala'iman. would parse as a single cmevla under the ala'um rule, while bala'i.man would parse as ba la'i man. This is presuming that la'i is not one of those words after which the pre-cmevla glottal stop can be omitted.

"bala'iman" breaks up. "bala'iaman" and "bala'i'uman" and "bala'uman" do not.

I was going to write that I was wrong about what I said, but now that I read your response more carefully I realize that you're even more wrong than I was.

So you mean to tell me that you think we should make the cmene rule "a cmene must not have any of the syllables la/lai/doi unless preceded by a consonant or followed by vowel or ', except that la' cannot be followed by i"? You seriously think this is a reasonable rule for people to remember? They can't even remember to avoid "la" at all, you think they can remember that "la'u" is okay but not "la'i"?? This would be a disaster; I'd probably prefer leaving the rule unchanged to making it so complicated.

la'i" is not allowed in cmene any more than "la" or "lai" or "doi". As "la" is a substring of both "lai" and "la'i", the only way I could explain why "lai" was specified as well as "la" is that "la" is allowed when followed by something other than a consonant. That makes "laus" a valid cmene. "lais" does not contain "la" in this way, but does contain "lai", so it's not a valid cmene. Similarly "la'is" is invalid because it contains "la'i".

la'i is invalid because it contains the syllable la. lai must be listed separately because it does not contain the syllable la. Yes, it contains the string, but Lojban morphology is concerned with syllables, not strings. The syllable lai is distinct from but not a superclass of the syllable la. lau would be allowed in a cmene, as it doesn't contain the taboo syllables.

The relevant sentence from the Book:

Names are not permitted to have the sequences ``la, ``lai, or ``doi embedded in them, unless the sequence is immediately preceded by a consonant. These minor restrictions are due to the fact that all Lojban cmene embedded in a speech stream will be preceded by one of these words or by a pause.

Since the sequence "lai" contains the sequence "la", either it is redundant, or the criterion is misstated. A cmene can be preceded by "la'i" without a pause, so the second sentence is in error.

Here is where I made my mistake... and you made me correct by making another one. The ability to precede a cmene without pause, near as I can tell, is a function of the word, not the selma'o. So la and lai can, but la'i can't. So I was wrong when I said that the .ala'um rule would incorrectly permit bala'iman, when in fact it would (by my understanding of it at the time) permit it with no problems: since there couldn't possibly be a word-break at the h in lahi (dammit, if I have to talk about the letter in isolation, let me use a glyph that's visible!), we know that the cmene can't begin there and we keep searching back to the true beginning. This is exactly the logic I expected of allowing lahu. And it's also the mirror of the exception that permits la when preceded by a consonant: there couldn't possibly be a break after the consonant without a pause, so the word must keep going.

By saying that la'i can precede a cmene without a pause and thus would be forbidden by the .ala'um rule, you have taken a complication that makes an unusably detailed rule positively picayune, and gone one more level to onerous.


 forum_name: Morphology
topic_title: cmene with la/lai/doi
    subject:
   username: And
  post_time: 2003-08-15 12:08:29
  post_text:

Another argument against the status quo is that we're evidently not sure where /./ can and can't be elided. So when in doubt, the safest course is to not elide it, & lo & behold you have cmevla unambiguously delimited by /./ ... /C./.


 forum_name: Morphology
topic_title: Representation of the Morphology
    subject: Representation of the Morphology
   username: jkominek
  post_time: 2003-09-13 02:46:56
  post_text:

.oicai .a'o .o'inai

Having been considering the morphology, again recently, I've become frustrated. As it stands, the word break 'algorithm' is described in English, and, I'm sorry, isn't the most readable document I've ever seen.

Pierre's valfendi is continuing the trend of describing the algorithm in English.

FFS, can we please concoct a formal description of the morphology, and then just dub it correct? (Yes, I'm glossing over the difficulty of that task.) The idea of trying to prove that anything is equivalent to an English description of the word formation rules is nightmarish, yet I get the impression that some people would like to try and accomplish that.

jbofi'e definition of the morphology belongs to the set of regular languages (Richard uses an NFA to describe it), so I'm pretty confident that a regular language should be sufficient for the task.

If it isn't, then, personally, I'm willing to throw away whatever bizarre little bit of fu'ivla space gets lost in the process. (Certainly if whatever we come up with accepts all the fu'ivla which are currently agreed upon as 'good', thats more than good enough for me.)

ri'e


 forum_name: Morphology
topic_title: Representation of the Morphology
    subject:
   username: xorxes
  post_time: 2003-09-13 10:31:10
  post_text:

Could you post the jbofi'e morphology in some human readable format, please?


 forum_name: Morphology
topic_title: Representation of the Morphology
    subject: Re: Representation of the Morphology
   username: rlpowell
  post_time: 2003-09-13 18:27:28
  post_text:

.oicai .a'o .o'inai Having been considering the morphology, again recently, I've become frustrated. As it stands, the word break 'algorithm' is described in English, and, I'm sorry, isn't the most readable document I've ever seen.

Ummm, hasn't Pierre been working on that extensively, along with Nora?

(See what happens when you don't read the list? 8)

Have you looked at his formal description or the program that implements it?

-Robin


 forum_name: Morphology
topic_title: Representation of the Morphology
    subject: Re: Representation of the Morphology
   username: jkominek
  post_time: 2003-09-13 18:57:27
  post_text:

Ummm, hasn't Pierre been working on that extensively, along with Nora?

Beats me. It hasn't been happening in the BPFK forum, which is what seems to be the place to do it.

(See what happens when you don't read the list? 8)

That is less the case than it used to be. I regularly look at the archive page, and read anything which appears to be interesting, based on author and subject. But when the topic drifts, and the subject changes, I don't know. Though Nora and Pierre are sufficiently interesting that I'd probably read anything they post unless it appears to be part of a huge thread that should just die. (See the previous little bit about subjects not changing.)

Have you looked at his formal description or the program that implements it?

I looked at valfendi, and while I have nothing against it, or Pierre for developing it, I don't think it solves the need. A English description of the morphology, even if it is written better, won't solve the problem.

There needs to be a description of the morphology just as formal as the grammar that can be fed into YACC. (As in, there needs to be a formal description, and a tool which can accept any description in the same class of languages, which, together, recognize Lojban morphology.)

My motivation for this, is that 1) if you've got a pile of custom code which does the recognition, you don't know what class of languages your recognition code falls into. If you don't know that, maximally efficient implementation is hard, if not impossible 2) a formal description of the morphology allows us to say, with total, provable certainty, what class of languages the morphology falls into. Which I find important in and of itself, and which helps with #1. 3) The techniques for probabilistic recognition of formal languages are well developed. Probabilistically recognizing something according to the English morphology rules that exist now is, uh, not likely to happen.