Tansky-Lechevalier scoring algorithm

From Lojban
Jump to: navigation, search

This is the commonly assumed algorithm for selecting a canonical form (dictionary form) of a lujvo. It was created by Bob and Nora Lechevalier in 1989, and is printed in section 4.12 in The Book. The following is a mostly verbatim quotation:

  1. Count the total number of letters, including hyphens and apostrophes; call it "L".
  2. Count the number of apostrophes; call it "A".
  3. Count the number of "y"-, "r"-, and "n"-hyphens; call it "H".
  4. For each rafsi, find the value in the following table. Sum this value over all rafsi; call it "R":
rafsi form example value
CVC/CV (final) -sarji 1
CVC/C -sarj- 2
CCVCV (final) -zbasu 3
CCVC -zbas- 4
CVC -nun- 5
CVV with an apostrophe -ta'u- 6
CCV -zba- 7
CVV with no apostrophe -sai- 8
5. Count the number of vowels, not including "y"; call it "V".
6. The score is then: (1000 * L) - (500 * A) + (100 * H) - (10 * R) - V

This score is calculated for all possible forms of the lujvo, and the one with the lowest score is selected as the canonical form. The algorithm has no provision for ties, but this is rare, given the large amounts of coefficients that must be factored in.

This algorithm was written on the basis of the personal tastes of its authors (which appears to be quite compatible to the tastes of the rest of the Lojban community). It prefers short words over long ones, and vowels over consonant clusters. It also ranks the different rafsi forms according to which of them the authors find more pleasing.