ISO-generated fu'ivla for scripts

From Lojban
Revision as of 02:13, 14 March 2018 by Krtisfranks (talk | contribs) (→‎Source Standard and Paradigm)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

krtisfránks proposes the adoption of a convention for easily naming scripts in Lojban. Moreover, it is suggested that the convention follow the paradigm established for other ISO-generated fu'ivla (to which there shall be repeated reference in this article).

Source Standard and Paradigm

The proposed source material would be ISO 15924, probably the Alpha-4 codes. The codes must (and should) be spelled with only basic Latin characters.

The paradigm will be that which is referenced here.

Complications

There are minor complications.

#1: Length

First, the Alpha-4 codes are long, especially compared to the codes for countries, languages, and currencies which each produce codes of (at most, for our purposes) three letters in length; however, this is the standard and no reasonable alternative exists (at least in ISO). Thus, we pretty much just need to accept them. We could go with the numerical codes instead, but that would be more difficult to remember and to produce on the fly and would require the creation of a new (non-conflicting) system. (By the way, we should probably develop such a system in any case, but I will hereinafter assume an Alpha-n code.)

#2: Capitalization

Second, exactly the first letter in the codes is capitalized for scripts unlike the other codes which are monotone in casing or do not seem to care. krtisfránks is not yet sure if this is a requirement of the standard. If it is, then we can ignore this issue because any output from the borrowing algorithm will just be assumed to follow this rule (wherein the first letter is capital and all others are lowercase). If not, then we must address this convention. In any case, if really desired, we could create an additional syllable that indicates the capitalization of exactly the next letter (the others defaulting to lowercase). This could be useful for sticklers who want to be careful with all codes or for use with comes wherein capitalization is important.

If anyone can provide more information, please do so here.

Update

UPDATE: According to expert and registrar by the ISO 15924 Registration Authority, Michael Everson, the casing does NOT matter. For more information check the content of this link. Any mention of casing throughout the rest of this article should be ignored; the algorithm which converts ISO 15924 (Alpha) codes to Lojban fu'ivla or back will not take casing into account; all letters in the ISO 15924 Alpha code will bijectively map to the same (phonotactic-context-dependent) string of lerfu (either 'Cu', 'Ce', 'V', or 'hV') in the Lojban fu'ivla, regardless of the casing with which they are conventionally given/displayed in indices or registration files. So, for example, "Zmth", "zmth", "zMth", "ZMTH", etc. will all map to "zumutuxe", regardless of which letters in the ISO code name/string are capital and which are lowercase; meanwhile, each of those ISO code names/strings shall be considered equivalent and "zumutuxe" shall map back to them all. - Krtisfranks (talk) 01:25, 14 March 2018 (UTC)

#3: Introducer Selection

Third, and this is the important one, we have to choose an appropriate gismu as the introducer. Three options immediately come to mind: "ciska", "lerfu", and "cilfu".

  • "ciska" is not entirely appropriate since it cares about the writer, the medium, and the ink in addition to what is actually written. Even then, it is really about the writing itself rather than the system of symbols that is being employed. It does have a cmarafsi which might be nice but it is not overly useful.
  • "lerfu" is pretty nice. It would be better if we could use "selyle'u", but at least the relationship is there. The downsides are that it ends with "-u", making it potentially confusing or garden-pathing since the letters in the code will usually be turned into syllables of the form '-Cu-'. For example, "lerfuzumutuxe" (Zmth) might be confused for "Fzmt... oh, there is an 'h'...". Also, if we ever want to name individual symbols using an introducer (such as borrowing from Unicode or random names), the two conventions in Lojban would at least be confusing with one another - and they could easily conflict or lead to garden-pathing. There might already be zi'evla which cause such problems. The upside is that there are cmarafsi, so the word length can be reduced and we can more closely follow the model of "bangu"-introduced borrowings, rather than "gugde"-introduced ones. (But we would have to adapt since "lerfu" is not as versatile as "bangu". This is a small technical matter. The cmarafsi of "lerfu" are much nicer than of "ciska".)
  • "cilfu" does not have the problem with semantics that the "ciska" and "lerfu" have; it definitely means "script" and can only mean that (no need to worry about borrowing names for individual symbols). It suffers from the terminal "-u" issue though. It also has no cmarafsi. And it is redundant. Why create a fu'ivla when a link-sumti construct would do? On the other hand, this redundancy could be viewed as making this word the prime candidate for the job. (I (krtisfránks) do think that the codes should be borrowed; so, if the job must be done, maybe this is the best way to do it.)

krtisfránks personally likes "lerfu" or "cilfu" for this role (in increasing order of preference). Perhaps if their final vowels were edited ("lerfa-"? "cilfe-"?), then they would be better. Then it is a matter of choosing either redundancy and word length versus potential conflicts with future borrowings and improper semantics.

An alternative would be to let the introducer be "slerfa-" (or something similar) for "selyle'u" and correcting the terminal "-u"; this might even lend itself to shorter words in some situations (by using "sler(f)-"). The bad news about this option is the fact that a word (and not just a rafsi-like fragment) called "slerfa" (or whatever) could come to be created (yes, I know that "slerfa" is not itself a Lojban word even in possibility - but I am using "slerfa" as a placeholder here), which would then conflict with the motivations of this proposal. We could book the word in order to pre-empt this possibility, but it would be a bit of a waste of brivla space.

In any case, after the introducer is selected, assuming that the first two issues raised are resolved nicely, the translation algorithm is the same as the rest and is ready to go.

Additionally, the introducer cannot end with "y".

Extra Rule

Once we select an introducer, then we should forbid the creation of any brivla which consists of or contains that introducer(s) followed by (concatenated with) anything other than a string which does not contain a consonant cluster within that substring's first four lerfu, unless the brivla is a zi'evla or lujvo which is explicitly and (in a reasonable way) semantically derived from a fu'ivla produced by this algorithm (and, in the case of lujvo, the valsi produced by this algorithm should be a veljvo and it or a mutation of it should be acting as a rafsi). rafsi are allowed if used appropriately.

For example: Suppose that we choose "cilfa-" to be the introducer string. Then we outlaw the creation of any brivla which begins with or contains the substring "cilfaX1X2X3X4", where Xi is any Lojban lerfu (not counting .y'y, which is optionally included with the selection of any vowel), unless Xj and X(j+1) are both consonants (where j is an integer such that 0<j<4) or the resulting word in total is a derivative of a word generated by this algorithm.

This might interfere with the creation of unrelated lujvo (such as "cilfau"). I suppose that we would have to exempt out non-zi'evla-veljvo lujvo.

Similar rules should be implemented for the other ISO-generated fu'ivla.

Mapping

(This is the same as in the results described here).

Each vowel V (except "Y"/"y") will be mapped to V if it is the first letter in the code and does not immediately follow a vowel, and to `V (id est: hV) if it is any subsequent letter in the code or immediately follows a vowel.

Each consonant C (except for "H"/"h", "Q"/"q", and "W"/"w") will map to Cu. This is regardless of their pronunciation in English, French, Latin, or the language that motivated the ISO code designation.

The aforementioned exceptions are handled thusly:

  • "Y"/"y" will map to je.
  • "H"/"h" will map to xe.
  • "Q"/"q" will map to ke.
  • "W"/"w" will map to ve.

Examples

The aforementioned issues can be resolved by various assumptions; what follows is a series of examples representing different such resolutions. It is not meant to be complete, only demonstrative.

NOTE: THIS IS NOT AN ENDORSEMENT OF ANY OF THESE ASSUMPTIONS!

Type A

Assume that: 1) Alpha-4 codes are used; 2) Casing does not matter; 3) "lerfu" is used as the introducer (this is the biggest caveat to this example).

Then:

  • The introducer is "ler-" unless the code begins with a vowel (other than "Y"), an "F", or an "R".
  • In the case of an initial vowel (other than "Y"), the introducer is "lerf-".
  • If the code begins with an "F", then the introducer is "le'ur-".
  • If the code begins with an "R", then the introducer is "le'un-".


The name for the script described/named as Mathematical Notation, which is assigned code 'Zmth', would be "lerzumutuxe". More illustratively:

Normal examples:

  • Aaaa -> lerfa'a'a'a.
  • Abaa -> lerfabu'a'a.
  • Baaa -> lerbu'a'a'a.
  • Bbbb -> lerbubububu.

"F"-initial examples:

  • Faaa -> le'urfu'a'a'a.
  • Fbaa -> le'urfubu'a'a.
  • Fbbb -> le'urfubububu.

"R"-initial examples:

  • Raaa -> le'unru'a'a'a.
  • Rbaa -> le'unrubu'a'a.
  • Rbbb -> le'unrubububu.


Note: I think that the case of initial "U" should be handled just fine under this system, but the audience will need to take care in interpreting such words.


NOTE: THIS IS NOT AN ENDORSEMENT OF ANY OF THESE ASSUMPTIONS!

  • In particular, I am not advocating for the adoption of "lerfu" as the introducer.

Type B

Assume that: 1) Alpha-4 codes are used; 2) Casing does not matter; 3) "cilfe-" (derived from "cilfu") is used as the introducer (this is the biggest caveat to this example).

Then:

  • The output follows the pattern of the ISO-generated country names.
  • The introducer is "cilfe-" unless the code begins with a non-"Y" vowel.
  • In the case of an initial vowel (excepting "Y"), the introducer is "cilfe`-"; note the presence of the .y'y.


The name for the script described/named as Mathematical Notation, which is assigned code 'Zmth', would be "cilfezumutuxe". More illustratively:

Normal examples:

  • Aaaa -> cilfe'a'a'a'a.
  • Abaa -> cilfe'abu'a'a.
  • Baaa -> cilfebu'a'a'a.
  • Bbbb -> cilfebubububu.


Note: A minor disappointment with this method is that "cilfe" is not a Lojban word. The good news is that, by gismu similarity conflict with "cilfu", it presently cannot be - thus (for the time being), we do not need to worry about "cilfe" being defined in some way contrary to this proposal's motivations and semantics.

This method has the following advantage over the Type A method proposal: Very little case-worrying is necessary, words may be generated easily and with little thought. The only issue that is of concern is that of initial vowels being handled appropriately. But, since forgetting the .y'y will phonotactically break the word in all but at most two cases anyway (these being initial "I" and possibly initial "U" in future versions of Lojban), spotting these errors/troubles should not be too difficult. It is not only regular, but simple as well.

The cost relative to the Type A proposal is that all output words will be at least as long from this method as from that method. Some elegance may be felt to be lost also.



NOTE: THIS IS NOT AN ENDORSEMENT OF ANY OF THESE ASSUMPTIONS!

  • In particular, I am not advocating for the adoption of "cilfe" as the introducer.