Eric S Raymond's Tengwar mapping
la tenguar: A romantic orthography for Lojban
Given the number of Lojbanists that are SF and fantasy fans, and the zeal for logical systematization that is such an important part of the Lojban esthetic, many of you have probably played with J.R.R. Tolkien's `Tengwar' alphabet from The Lord of the Rings.
The Tengwar must have made a fascinating thought-exercise for Tolkien the linguist - a pure phonetic alphabet with orthographic symmetry to match its expressive power. He based the tengwa shapes on the lovely letter forms of the Celtic half-uncial hand of the Book of Kells, arguably the apogee of medieval manuscript illumination.
While studying the Lojban phonetic rules in the Synopsis, it occurred to me that Tolkien's objectives in composing the Tengwar paralelled the aims of the Lojban phonology design. Both have precision, compactness and cross-cultural intelligibility as primary goals (in Tolkien's world the Tengwar were used for many different languages, and he actually gives `modes' for Quenya, Sindarin, Dwarvish, and English). Upon realizing this, I was seized with an irresistable desire to compose a Tengwar mode for Lojban.
You may well ask ``why bother? I reply ``mainly for the creative fun of it. But the exercise also helped me solidify my knowledge of Lojban phonological and word-formation rules. I give this the results to the Lojban community in hopes that others will find them similarly amusing and instructive.
And...who knows? You'll see below that some Tengwar inventions offer a more natural representation of certain Lojbanic structural features than the English version of the Latin alphabet can manage. Perhaps this is worth having a private orthography for. And, as anyone who has ever tried them will attest, even poorly-executed tengwar are both prettier than the Roman alphabet and better adapted to cursive writing.
The original Tengwar chart is in the appendices to The Return of the King (volume III of LOTR (the chart is in Appendix E, page 494 of the Ballantine pb edition). When I wrote the original version of this paper in late 1994, I referred to them only by Tolkien's numbers. For this Web version, I will add references to the following table:
which is copied from Michael Everson's Unicode/ISO 10646-2 proposal for the Tengwar. References are this given as given as n(x/y), where n is Tolkien's number, x is the table column number and y the row number. Where Tolkien describes a tehta that appears in the Everson table but is unnumbered, I reference it simply as (x/y).
For what follows it is particularly important to remember that in any given mode the Tengwar consonants form a phonetic grid in which alterations in form correspond to alterations in sound in a predictable way (if you will, this is `audio-visual isomorphism' extended to the letter-form level). Before continuing, you should probably reread (or at least re-skim) Tolkien's discussion of the Feanorian letters, pages 495-500 following the chart.
We will tackle Lojban's phonetics in the following order:
- vowels (including /y/)
- comma, dot, smooth breathing
- the sentence divider, cmene stress exceptions
Lojban phonetics is much more easily represented in the Tengwar than that of English; its vowels are pure, rather than palatalized, and there is little allophony in the consonants. In fact Lojban phonemes rather resemble the simple ones of Tolkien's `Westron' or `Common'. This should not be regarded as an accident. Tolkien constructed quite complex phonologies for Sindarin, Quenya and Dwarvish, but the Westron of his universe had the role of a lingua franca, and he was well aware that such languages tend to evolve toward maintaining only very simple and robust phonetic distinctions.
Accordingly, we can tentatively make the following consonant assignments of simple Lojban phonemes identical to Westron ones:
LOJBAN: 1(0/0) = /t/, 2(0/1) = /p/, 4(0/3) = /k/, 5(0/4) = /d/, LOJBAN: 6(0/5) = /b/, 8(0/7) = /g/, 17(1/0) = /n/, 18(1/1) = /m/
leaving /c/, /f/, /j/, /l/, /r/, /s/, /v/, /x/, and /z/ unbound. This follows the Third Age convention of using Grade 1 for voiceless stops, Grade 2 for voiced stops, and Grade 5 for nasals. Also note that Series 1 is palato-alveolar-dental, Series 2 labial and Series 4 velar.
The astute reader will have already noticed that there is a hole in the two-dimensional pattern made by these at 20(1/3), which (though Tolkien doesn't give a value for it in Westron) ought to represent a nasalized velar like /ng/. This is because /ng/ is allophonic with /n/ in Lojban. Amusingly, this may also have been true in archaic Quenya; see the discussion of 'NG' in the pronunciation guide and `noldo' under letter-names.
Also, Series III (Westron c, j, sh, zh) isn't used at all so far. This is mainly because when I originally designed this mode in 1994 I thought this series was compromised by a glaring inconsistency in the Westron assignments; 3(0/2) was in my sources as given as `c', making no distinction between 'c' and `k' in what is supposed to be a phonetic alphabet. In early 2000 I learned that this was due to a misprint in most editions of LoTR printed after the mid-1960s; Tolkien actually (and quite logically) intended 3(0/2) to stand for the unvoiced palato-alveolar affricate /ch/ of English `church'.
The other three sounds in the Westron binding of Series III are Lojban's /dj/, /c/ and /j/. respectively; we will need these, but it is not immediately clear how best to assign tengwar for them.
Technically, /dj/ (the unvoiced palato-alveolar affricate) is a composite in Lojban. This brings up a sticky question; should the mode ever represent consonent clusters by a single tengwa? Given the lojban rule of audiovisual isomorphism, there are three possible answers:
- No. The rule is `one sound, one letter', and all clusters respresent two distinct sounds, albeit they are slurred together in speech. Besides, we don't want to make syllabification difficult.
- Maybe. Permissible initial clusters act like single consonants in one important respect; words may be syllabified that way.
- Yes. As long as spelling remains perfectly phonetic (in that there is only one character sequence corresponding to a given phoneme sequence) who cares that the written whole might not equal the sum of its syllabic parts?
Answer #3 is theoretically defensible, but seems to me to introduce complexity where there ought not to be any (most English-speakers are too used to gnarly spelling to appreciate how frustrating those raised on other tongues find it).
Answer #1 is hard to defend for /dj/ in particular; consider the English "longjump" (Lojban "*londjymp") which would certainly break to /lon,djymp/ and not /lond,jymp/. Or Lojban "dadjo" (Tao) which naturally breaks as /da,djo/, not /dad,jo/ (pronounce it and see).
Answer #2, then, looks attractive -- until one looks at the permissible initials and realizes how few of the clusters on it are as tight as /dj/. Perhaps /ts/, /dz/ and /mr/ might qualify, though none of them is perceived as a single phoneme in English. None of these have a single-tengwa representation in any of the Feanorian modes.
Therefore, we shall construct a simple one-phoneme-to-one-grapheme mapping. Let's look at the remaining consonents in this light.
The liquids are the easiest to dispose of. Tolkien says that 27(1/A) was "universally used" for /l/. There are separate signs 21(1/5) and 25(1/8) for trilled and untrilled /r/, which are allophonic in Lojban; we choose the untrilled 21 because it resembles the Roman `r' rather than a `y'.
LOJBAN: 21(1/5) = /r/, 27(1/A) = /l/
The remaining consonants /c/, /f/, /j/, /s/, /v/, /x/, and /z/ are all fricatives or affricates. Note that all but /x/ break naturally into voiced and unvoiced pairs; /j/, /v/, /z/ versus /c/, /f/, /s/. The voiced counterpart of the unvoiced velar fricative /x/ (often rendered as /gh/) occurs neither in Lojban nor English.
The Tengwar treats these rather inconveniently for Lojban's purposes. Tolkien describes grades 3 and 4 as unvoiced and voiced `spirants', an older term for fricatives and affricates (note the doubling of the bow, indicating voicing, in grade 4).
A simple Westron-like choice of tengwar would imply
WESTRON: 10(0/9) = /f/, 11(0/A) = /c/, 14(0/D) = /v/, 15(0/E) = /j/
Logically, this system would assign 3(0/2) to an unvoiced palatal-alveolar stop, something like the `flapped' alveolar t in English "bottle" (compare the alveolar-dental /t/ in English "tin"; however the flapped alveolar stop is allophonic to dental /t/ or /d/ in English, Lojban and almost all other languages except archaic Sanskrit!
Also, the Westron values list no equivalent for Lojban /x/, which evidently does not occur in the language. Logically, an unvoiced velar fricative should be Series IV, Grade 3 in the Westron mode; tengwa 12(0/B). Ironically, this is a better logical fit to tengwa 12 than the ch sound Tolkien assigns it in the Westron mode.
Finally, note that Westron and all of Tolkien's other languages assign the palato-dental fricatives to auxilliary letters 29(1/C) and 31(1/E).
WESTRON: 29(1/C) = /s/, 31(1/E) = /z/.
These Westron-like assignments would leave ugly holes in the tengwar table for Lojban and exile s and z far from their phonetic brethren. Unfortunately the Quenya use of Series III and IV as k and kw series would leave us worse off; that mode has no representations for /c/ or /j/, and the /s/-/z/ pair is also mapped to 29(1/C) and 31(1/E) there.
The best alternatives are, therefore, to set 12(0/B) = /x/ and either
- use the Westron choices for /c/, /f/, /j/, /s/, /v/, and /z/, or
- create a whole new set of assignments for grades 3 and 4 that dedicates these grades to unvoiced and voiced fricatives/affricates including /s/ and /z/. This set of assignments might move the sounds at Grade 3 /c/, /j/ but should preserve the Grade 2 ones so that they are consistently labials.
Now the /s/ and /z/ sounds certainly aren't velar, which would knock 12-16 out of the running for the /s/-/z/ pair even if 12(0/B) weren't the logical place to put /x/.
Thus we have basically two choices left:
CHOICE1: 9(0/8) = /s/, 13(0/C) = /z/, 11(0/A) = /c/, 15(0/E) = /j/
CHOICE2: 9(0/8) = /c/, 13(0/C) = /j/, 11(0/A) = /s/, 15(0/E) = /z/
Either of these is defensible, but choice 1 is a better fit to Series 1 (/t/, /d/, /s/ and /z/ are all palato-dentals). But notice that this is the Westron choices with 9(0/8) and 13(0/C) substituted for 29(1/C) and 31(1/E).
It looks as though we gain little from trying to avoid the special letters. All this leads me to propose a compromiser's way out -- let 9(0/8) and 29(1/C) both be legal /s/ characters, with 9 preferred for print and book-hand forms and the simpler 29 available for cursive writing and backward-compatibility with Elvish (what a concept!).
This completes our set of Lojban consonants. Here are the final tengwa mappings for reference:
LOJBAN: 1(0/0) = /t/, 2(0/1) = /p/, 4(0/3) = /k/, 5(0/4) = /d/, LOJBAN: 6(0/5) = /b/, 8(0/7) = /g/, 9(0/8) = /s/, 10(0/9) = /f/ LOJBAN: 11(0/A) = /c/, 12(0/B) = /x/, 15(0/E) = /j/, 17(1/0) = /n/, LOJBAN: 18(1/1) = /m/, 21(1/4) = /r/, 27(1/A) = /l/, 29(1/C) = /s/, LOJBAN: 31(1/E) = /z/
Lojbanic vowels are also simpler than those of English and more like those of Westron or the Elvish languages (in being pure rather than palatalized). It also helps that there is no long/short vowel distinction in Lojban.
The first basic decision to make is whether the Lojban mode is to represent vowels with `tehtar' (diacriticals modifying a preceding or following consonant) or `full writing' in which the vowels have separate letters. Both are worth investigating; a full-writing mode would be easier for character- oriented computer I/O devices, but tehtar might be more convenient for cursive and `speed' writing.
We'll tackle the full-writing option first. We need six signs, for /a/, /e/, /i/, /o/, /u/ and the schwa /y/ used in cmene. All of these occur in the Sindarin inscription on the West-Gate of Moria in Volume I, though not all are shown in Appendix E. In the `Mode of Beleriand', then, the vowel letters are:
- a curl open to the right (like cursive latin `c'), tengwa (3/2).
- tengwa 33(2/0).
- the `short carrier' tehta (2/5), resembling an undotted 'i'.
- tengwa 23(1/6).
- tengwa 36(2/3).
- tengwa 30(1/D).
These are suitable for Lojbanic use, except that the resemblance of tengwa 36(2/3) to latin `o' and the zero numeral could cause confusion. Therefore, let's anticipate an idea from the vowel tehtar below and agree to use tehta (3/0), a curl open to the left, for /u/. Coincidentally, this sign is the Tengwar numeral 0.
The usual bindings of the vowel tehtar are also quite appropriate for Lojban. As vowel endings dominate in Lojban, vowel tehtar should be written above the preceding letter (this is like Quenya, but unlike Sindarin, Westron, and English). Since we have full-letters for vowels and vowel pairs occuring in the language, let us also allow a tehta to sit above a preceding vowel.
The usual tehtar signs are:
- three dots arranged in a triangle, or a circumflex (4/0)
- an acute accent (4/6)
- a medial dot (4/4)
- an acute accent with a terminating downward hook (4/8)
- a grave accent with a terminating downward hook (4/A)
- a dot written under the preceding consonent (4/5)
Where there is no immediately preceding letter in Quenya, the `short carrier' tehta like an undotted `i' (2/5) is placed under the vowel. Unfortunately, this is ambiguous with the full-letter use; the grapheme `i' could signify either i-after-lexeme break or /ii/ depending on whether one were using full-writing or tehtar vowels. To resolve this, we shall have Lojban use the `long carrier' instead, a sign like an undotted `j' (2/6).
Dipthongs could of course be written out in full. But a simple convention would help them appear lexically as the phonemic units they actually are. I would encourage Lojbanists to write VV dipthongs as the tehta for the second vowel of the dipthong over the full vowel for the first (thus /oi/ could be written quickly as something resembling a script Roman `c' with a dot over it).
Next, we need to deal with the Lojbanic peculiarities of smooth breathing ('), dot, and the /.i/ sentence divider. As it happens, the Tengwar includes signs for the first and third of these.
In fact, Tolkien gives two signs for smooth breathing. In the original Quenya of Elvenhome, breath h was written as a simple raised stroke with no bow, called `halla' (2/4). Later, among the Exiles, tengwa 33(2/0) came into use. As we use 33(2/0) for full-written /e/, Lojban must revert to the more archaic high-elven form.
Orthographically, this might seem to create a problem for use of tehtar; you can't superimpose vowel tehtar on the halla, so the vowels on either side must be full-written. But in Lojban this is actually an advantage -- because vowel pairs separated by smooth breathing are disyllabic! The halla thus marks a potential syllable break, and ensures that the tengwar whole word will equal the concatenation of its syllabic parts.
The /.i/ sentence divider also has an analogue in the Mode of Beleriand -- the colon-like signs or `double pusta' (5/1) that bracket the West-Gate inscription and separate phrases in it. This is not described in the Appendix, but it reflects a common practice in medieval manuscripts with which Tolkien must have been familiar. Dan Smith's Tengwar page claims this punctuation is used in all modes.
Finally, the dot. As it happens, there is an obligatory-pause symbol in the Tengwar (used among other things to translate commas). There is also no reason not to simply use a medial dot! Confusion with the schwa under-dot might be a problem if schwa ever occurred as a final sound, but it doesn't in Lojban. An initial /i/ dot will always sit on an /i/ full-letter or long carrier, eliminating that possible confusion as well. The medial dot is Tolkien's `pusta' (5/0).
Handling Lojban cmene
Next we must consider syllabication exceptions in cmene. We need to invent tengwar equivalents of Lojban capitalization and the close-comma. Neither exists in any of Tolkien's modes.
Tolkien describes the use of a bar or underline (4/D) to lengthen consonants. As long and doubled consonents don't occur in Lojban we can take this over to indicate a nonstandard stress accent.
For close-comma we can use comma. This is a perfectly reasonable pseudo-tehta, not confusable with anything (the `long carrier' sign closest to it would carry a diacritical).
One final cursive-writing trick. We are told that a hook added to the bow of the last letter in a word (5/C or 5/D) signifies `following s'. This will be useful in Lojban, especially for cmene which frequently end in /s/.
Lojban cmene for the Tengwar themselves
The Lojban tengwar mode I have described should enable Lojban to be written more compactly and beautifully than the Roman alphabet permits (VV-form cmavo, for example, will look like the morphemic atoms they are). But there is another job to be done; we need to be able to talk about the Lojban tengwar in Lojban! We need names and/or predicates for all the constructs above.
We begin easily enough by observing that `tengwar' Lojbanizes easily to a valid cmene form as `tenguar' or `tenuar' (with /n/ in its allophonic ng form). The first form is, I think, preferable; the original tends more toward /teng+gwahr/ than /ten+gwahr/. This would be the name of the whole letter system.
To avoid constant indirection through lerfu we could coin a klogi'u `tenga' with place structure "is a tenguar letterform representing ..." It is not immediately clear whether the gain from this justifies taking a slot from the CVCCV brivla-space (also the resemblance to Spanish `tenga' is unfortunate).
Because there are more signs than sounds in the cursive sub-mode, we also want individual names for the tenguar letterforms and tehta. Unfortunately, Tolkien's name-list (being based on words in Quenya) would be inappropriate even if we lojbanized all names.
Most of the letter tengwar (that is, the non-tehtar characters) are implicitly named by the lerfu they are equivalent to. So are some tehta (comma, dot, the the halla sign, and perhaps the sentence-divider). We need names for the `cursive' /s/ sign, though, and for the vowel tehtar, vowel carrier, stress underbar, and following-s hook.
It would also be useful to have Lojban equivalents for `stem' (Elvish `telco') and `bow' (`luva'), the words used to describe parts of tengwa; and we want a third coinage for the stroke that distinguishes Series II and IV from I and III. Finally, we need collectives for the `letters' and tehta, and some way to refer to the difference between full-written and cursive tehta-vowel modes.
One good attack on the problem would start with a series of tanru describing the tehtar and auxiliary /s/ sign as abbreviated-tengwa-<foo> where <foo> is a lerfu. Unfortunately, at this point your humble author knows he is way out of his depth. Help from any more experienced Lojbanists in building proper tanru, klogi'u and le'avla to describe the Lojban tenguar would be greatly appreciated.
This is $Revision: 1.8 $.
Here is a version history.
1.1 -- 26 Dec 1998: A straight transcription of my 1994 original, adding Everson's Tengwar table.
1.2 -- 17 May 1999: Erskin Meldrew pointed out an error in the 1.1 version; it assigned 12(0/B) to both /x/ and /z/. Fixed. Also added the Related Resources section.
1.3 -- 16 Oct 1999: Further minor corrections suggested by Erskin Meldrew. The 1.2 version had several errors in the glyph references.
1.4 -- 26 Oct 1999: Minor markup fixes.
1.5 -- 15 Apr 2000: Revised to incorporate Geoff Eddy's note about the 3(0/2)=c misprint and corrections to my terminology.
1.6 -- 16 Apr 2000: Revised to reflect what is known about Tengwar punctuation (changes the treatment of dot).
1.7 -- 30 Sep 2000: Corrected a typo.
1.7 -- 29 Oct 2000: Added /f/ to consonant summary.
Erskin Meldrew maintains a Lojban mode (based on the 1.1 version of this paper) for a program called Tengwar Scribe, written Måns Björkman.
Dan Smith's Fantasy Fonts page resource for Tengwar True Type fonts. His key mappings have become something of a minor standard; his three font styles are very complete and come with excellent documentation in both RTF and Microsoft Windows help format. You can use xfstt to get to True Type fonts under X.
|See also Eric's Home Page||$Date: 2002/08/02 02:19:29 $|