ISO-generated fu'ivla for scripts

From Lojban
Revision as of 20:30, 3 March 2018 by Krtisfranks (talk | contribs) (#3: Introducer Selection)
Jump to: navigation, search

krtisfránks proposes the adoption of a convention for easily naming scripts in Lojban. Moreover, it is suggested that the convention follow the paradigm established for other ISO-generated fu'ivla.

Source Standard

The proposed source material would be ISO 15924, probably the Alpha-4 codes.

Complications

There are minor complications.

#1: Length

First, the Alpha-4 codes are long, especially compared to the codes for countries, languages, and currencies which each produce codes of (at most, for our purposes) three letters in length; however, this is the standard and no reasonable alternative exists (at least in ISO). Thus, we pretty much just need to accept them. We could go with the numerical codes instead, but that would be more difficult to remember and to produce on the fly and would require the creation of a new (non-conflicting) system. (By the way, we should probably develop such a system in any case, but I will hereinafter assume an Alpha-n code.)

#2: Capitalization

Second, exactly the first letter in the codes is capitalized for scripts unlike the other codes which are monotone in casing or do not seem to care. krtisfránks is not yet sure if this is a requirement of the standard. If it is, then we can ignore this issue because any output from the borrowing algorithm will just be assumed to follow this rule (wherein the first letter is capital and all others are lowercase). If not, then we must address this convention. In any case, if really desired, we could create an additional syllable that indicates the capitalization of exactly the next letter (the others defaulting to lowercase). This could be useful for sticklers who want to be careful with all codes or for use with comes wherein capitalization is important.

#3: Introducer Selection

Third, and this is the important one, we have to choose an appropriate gismu as the introducer. Three options immediately come to mind: "ciska", "lerfu", and "cilfu".

  • "ciska" is not entirely appropriate since it cares about the writer, the medium, and the ink in addition to what is actually written. Even then, it is really about the writing itself rather than the system of symbols that is being employed. It does have a cmarafsi which might be nice but it is not overly useful.
  • "lerfu" is pretty nice. It would be better if we could use "selyle'u", but at least the relationship is there. The downsides are that it ends with "-u", making it potentially confusing or garden-pathing since the letters in the code will usually be turned into syllables of the form '-Cu-'. For example, "lerfuzumutuxe" (Zmth) might be confused for "Fzmt... oh, there is an 'h'...". Also, if we ever want to name individual symbols using an introducer (such as borrowing from Unicode or random names), the two conventions in Lojban would at least be confusing with one another - and they could easily conflict or lead to garden-pathing. There might already be zi'evla which cause such problems. The upside is that there are cmarafsi, so the word length can be reduced and we can more closely follow the model of "bangu"-introduced borrowings, rather than "gugde"-introduced ones. (But we would have to adapt since "lerfu" is not as versatile as "bangu". This is a small technical matter. The cmarafsi of "lerfu" are much nicer than of "ciska".)
  • "cilfu" does not have the problem with semantics that the "ciska" and "lerfu" have; it definitely means "script" and can only mean that (no need to worry about borrowing names for individual symbols). It suffers from the terminal "-u" issue though. It also has no cmarafsi. And it is redundant. Why create a fu'ivla when a link-sumti construct would do? On the other hand, this redundancy could be viewed as making this word the prime candidate for the job. (I (krtisfránks) do think that the codes should be borrowed; so, if the job must be done, maybe this is the best way to do it.)

krtisfránks personally likes "lerfu" or "cilfu" for this role (in increasing order of preference). Perhaps if their final vowels were edited ("lerfa-"? "cilfe-"?), then they would be better. Then it is a matter of choosing either redundancy and word length versus potential conflicts with future borrowings and improper semantics.

An alternative would be to let the introducer be "slerfa-" (or something similar) for "selyle'u" and correcting the terminal "-u"; this might even lend itself to shorter words in some situations (by using "sler(f)-"). The bad news about this option is the fact that a word (and not just a rafsi-like fragment) called "slerfa" (or whatever) could come to be created (yes, I know that "slerfa" is not itself a Lojban word even in possibility - but I am using "slerfa" as a placeholder here), which would then conflict with the motivations of this proposal. We could book the word in order to pre-empt this possibility, but it would be a bit of a waste of brivla space.


In any case, after the introducer is selected, assuming that the first two issues raised are resolved nicely, the translation algorithm is the same as the rest and is ready to go.

Mapping

Each vowel V (except "Y"/"y") will be mapped to V if it is the first letter in the code and does not immediately follow a vowel, and to `V if it is any subsequent letter in the code or immediately follows a vowel.

Each consonant C (except for "H"/"h", "Q"/"q", and "W"/"w") will map to Cu. This is regardless of their pronunciation in English, French, Latin, or the language that motivated the ISO code designation.

The aforementioned exceptions are handled thusly:

  • "Y"/"y" will map to je.
  • "H"/"h" will map to xe.
  • "Q"/"q" will map to ke.
  • "W"/"w" will map to ve.

Examples

The aforementioned issues can be resolved by various assumptions; what follows is a series of examples representing different such resolutions. It is not meant to be complete, only demonstrative.

NOTE: THIS IS NOT AN ENDORSEMENT OF ANY OF THESE ASSUMPTIONS!

Type A

Assume that: 1) Alpha-4 codes are used; 2) Casing does not matter; 3) "lerfu" is used as the introducer (this is the biggest caveat to this example).

Then:

  • The introducer is "ler-" unless the code begins with a vowel (other than "Y"), an "F", or an "R".
  • In the case of an initial vowel (other than "Y"), the introducer is "lerf-".
  • If the code begins with an "F", then the introducer is "le'ur-".
  • If the code begins with an "R", then the introducer is "le'un-".


The name for the script described/named as Mathematical Notation, which is assigned code 'Zmth', would be "lerzumutuxe". More illustratively:

Normal examples:

  • Aaaa -> lerfa'a'a'a.
  • Abaa -> lerfabu'a'a.
  • Baaa -> lerbu'a'a'a.
  • Bbbb -> lerbubububu.

"F"-initial examples:

  • Faaa -> le'urfu'a'a'a.
  • Fbaa -> le'urfubu'a'a.
  • Fbbb -> le'urfubububu.

"R"-initial examples:

  • Raaa -> le'unru'a'a'a.
  • Rbaa -> le'unrubu'a'a.
  • Rbbb -> le'unrubububu.


Note: I think that the case of initial "U" should be handled just fine under this system, but the audience will need to take care in interpreting such words.


NOTE: THIS IS NOT AN ENDORSEMENT OF ANY OF THESE ASSUMPTIONS!

  • In particular, I am not advocating for the adoption of "lerfu" as the introducer.

Type B

Assume that: 1) Alpha-4 codes are used; 2) Casing does not matter; 3) "cilfe-" (derived from "cilfu") is used as the introducer (this is the biggest caveat to this example).

Then:

  • The output follows the pattern of the ISO-generated country names.
  • The introducer is "cilfe-" unless the code begins with a non-"Y" vowel.
  • In the case of an initial vowel (excepting "Y"), the introducer is "cilfe`-"; note the presence of the .y'y.


The name for the script described/named as Mathematical Notation, which is assigned code 'Zmth', would be "cilfezumutuxe". More illustratively:

Normal examples:

  • Aaaa -> cilfe'a'a'a'a.
  • Abaa -> cilfe'abu'a'a.
  • Baaa -> cilfebu'a'a'a.
  • Bbbb -> cilfebubububu.


Note: A minor disappointment with this method is that "cilfe" is not a Lojban word. The good news is that, by gismu similarity conflict with "cilfu", it presently cannot be - thus (for the time being), we do not need to worry about "cilfe" being defined in some way contrary to this proposal's motivations and semantics.

This method has the following advantage over the Type A method proposal: Very little case-worrying is necessary, words may be generated easily and with little thought. The only issue that is of concern is that of initial vowels being handled appropriately. But, since forgetting the .y'y will phonotactically break the word in all but at most two cases anyway (these being initial "I" and possibly initial "U" in future versions of Lojban), spotting these errors/troubles should not be too difficult. It is not only regular, but simple as well.

The cost relative to the Type A proposal is that all output words will be at least as long from this method as from that method. Some elegance may be felt to be lost also.



NOTE: THIS IS NOT AN ENDORSEMENT OF ANY OF THESE ASSUMPTIONS!

  • In particular, I am not advocating for the adoption of "cilfe" as the introducer.