free gismu space: Difference between revisions

From Lojban
Jump to navigation Jump to search
mNo edit summary
 
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{se inspekte/en}}
==First analysis==
*Total number of possible gismu forms: ''96,475''
*Total number of possible forms excluding the last vowel: ''19,295''
*Total number of official gismu: ''1,342''
*Total number of gismu forms that clash with official gismu: ''11,874'' (according to the definition of which gismu clash with each other in the book, including the forms of the official gismu themselves)


mi'e xod .i le kulnu gismu goi ko'a cu toldrani je selsrera ki'u di'e
Percentage of forms actually taken: ''12.31'', or about 1 in 8 (whether by clashes or by the actual gismu themselves)


# '''pa'enai zmanei''' le pu selcuxna kulnu na'ebo vo'e
* But this is misleading: once you actually assign a new gismu, a variable amount of gismu space becomes used up.
** Good point. However, the most forms that a gismu can use up is 13: 5 for the last vowel plus 2 for each consonant. So there are at least ''6,507'' free gismu. In practice, it seems they use up about 9 forms each, so there are about ''9,400'' free forms.


# ko'a '''na sance simsa''' le fatci kulnu cmene
Percentage of forms actually taken (ignoring clashes other than the last vowel): ''6.96'', or about 1 in 15.
==A different analysis==
Results:
# There are 19365 legal gisms (4-letter rafsi) by the rules.
# There are 1338 gisms in actual use.
# The gismu avoidance rules block 3731 more gisms, for an effective total of 5069 in use.
# This leaves 14566 gisms available.
# On average, each gism blocks 4 other gisms.
# In effect, then, we can have about 2900 more gismu, depending on the exact details of assignments.
* That's not correct. You can't count 4-letter rafsi, because 4-letter rafsi '''are''' allowed to be very similar to other 4-letter rafsi. The gismu they come from could differ by one small change in the 4-letter stem and by the final vowel.
**If this is correct, then we can't have a distinct gismu for every culture, because the number of languages spoken around the world is 6000-7000, depending on how you count.
***[[User:And Rosta|And Rosta]]:
****But we can have an indistinct gismu for every culture.
==Third analysis==
*skat:
**I have written a program that by brute force actually <u>counts</u> the number of free gismu. I started by generating a list of all 96475 possible gismu forms.  We can call 'em "candidate gismu" or "proto-gismu".  I then went through the list of 1342 official gismu, one at a time.  For each one, I deleted it from the list of proto-gismu. Then I deleted all the proto-gismu that differed only by the final vowel except for the "brodV" series (are there any other exceptions like this?). Then I deleted all the proto-gismu that were blocked based on consonant similarity as in the table given just above.  When I had done that for all the official gismu I counted the remaining proto-gismu.
**The answer was rather surprising.  I find that there are still 85,536 available gismu forms.  Now, I'm sure you'll say, "That's simply not possible!" But think about it.  A lot of the forms that would be blocked by the consonant similarity rules aren't even valid proto-gismu.  (Don't forget, to be a valid proto-gismu a form still has to conform to certain rules.)  So fewer forms are blocked than you might think.  Additionally, a number of forms (no, I haven't counted them; maybe later) are blocked multiple ways.  For instance, *'''bajru''' is blocked by both '''bacru''' and '''bajra'''.  Because of this overlap fewer forms are blocked than you might think.
**I'm pretty confident that my program is correct, but it would be a Good Thing if someone were to attempt to verify my results.
***[[User:xorxes|xorxes]]
****You have to consider that those 85,536 are not independently available. Choosing one will block many of the others. Close to 80% of them will be blocked by the last vowel rule.
*****[[User:And Rosta|And Rosta]]:
******But experimental gismu won't block each other (or at least, we can't say which blocks which). Furthermore, the blocking rules were part of the algorithm for assigning the original gismu forms and are not part of the living morphological rules that constrain, say, fu'ivla and cmevla, since the gismu are supposed to form a closed class. Since experimental gismu are inherently unofficial, there is no need to suppose that the blocking rules would apply to them. I accept that one would expect official gismu to conform to the blocking rules.
******[[User:xorxes|xorxes]]
*******Right, but in that case the relevant number is 96475 - 1342 = 95133 available forms.
==Analysis by [[la gleki]]==
* [[la gleki]]:
** I [https://la-lojban.github.io/free-gismu-space/ can count] around 17895 possible experimental gismu in addition to 1392 official gismu


.iseni'ibo .e'ucai pilno le fu'ivla be le'a li ci
==Questions==
 
*[[User:tsali|tsali]]:
----
**Can you tell me exactly the rules by which gismu block each other, and gisms block each other?
 
** See [https://lojban.github.io/cll/4/14/ Chapter 4, Section 14 of the Book]
Hesitatingly I agree that they should indeed be replaced with fu'ivla. Going to produce the list of fu'ivla for us, xod? :) --[[jbocre: Jay Kominek|Jay]]
 
''mi'e xod .i zo'o .o'u la nitcion. noi dukse selcuntu cu catni la'e di'u le jbogri''
 
----
 
I too agree with Xod, regarding both his reasons and his conclusions. -- mi'e [[User:And Rosta|And Rosta]].
 
----
 
=== The blacklist: 54 cultural gismu ===
 
''source: [[jbocre: gismu etymology|gismu etymology]], with 27 scientific constants and powers of ten removed''
 
||
 
baxso | Malay-Indonesian
 
bengo | Bengali         
 
bemro | North American
 
bindo | Indonesian
 
brazo | Brazilian
 
brito | British         
 
budjo | Buddha         
 
dadjo | Tao
 
dotco | German
 
dzipo | Antarctican
 
filso | Palestinian
 
fraso | French
 
friko | African         
 
gento | Argentinian
 
glico | English         
 
jegvo | Jehovah         
 
jerxo | Algerian       
 
jordo | Jordanian       
 
jungo | Chinese
 
kadno | Canadian
 
ketco | South American
 
kisto | Pakistani       
 
latmo | Latin/Latium   
 
libjo | Libyan         
 
lojbo | Loglandic       
 
lubno | Lebanese       
 
meljo | Malaysian
 
merko | American       
 
mexco | Mexican         
 
misro | Egyptian
 
morko | Moroccan       
 
muslo | Islam/Moslem
 
polno | Polynesian     
 
ponjo | Japanese
 
porto | Portuguese     
 
rakso | Iraqi           
 
ropno | European       
 
rusko | Russian         
 
sadjo | Saudi           
 
semto | Semitic         
 
sirxo | Syrian         
 
skoto | Scottish       
 
softo | Soviet/USSR     
 
spano | Spanish         
 
sralo | Australian     
 
srito | Sanskrit       
 
xazdo | Asiatic         
 
xebro | Hebrew(Israeli) 
 
xelso | Greek
 
xindo | Hindi
 
xispo | Hispanic       
 
xrabo | Arabic         
 
xriso | Christ         
 
xurdo | Urdu ||

Latest revision as of 05:53, 9 October 2017

First analysis

  • Total number of possible gismu forms: 96,475
  • Total number of possible forms excluding the last vowel: 19,295
  • Total number of official gismu: 1,342
  • Total number of gismu forms that clash with official gismu: 11,874 (according to the definition of which gismu clash with each other in the book, including the forms of the official gismu themselves)

Percentage of forms actually taken: 12.31, or about 1 in 8 (whether by clashes or by the actual gismu themselves)

  • But this is misleading: once you actually assign a new gismu, a variable amount of gismu space becomes used up.
    • Good point. However, the most forms that a gismu can use up is 13: 5 for the last vowel plus 2 for each consonant. So there are at least 6,507 free gismu. In practice, it seems they use up about 9 forms each, so there are about 9,400 free forms.

Percentage of forms actually taken (ignoring clashes other than the last vowel): 6.96, or about 1 in 15.

A different analysis

Results:

  1. There are 19365 legal gisms (4-letter rafsi) by the rules.
  2. There are 1338 gisms in actual use.
  3. The gismu avoidance rules block 3731 more gisms, for an effective total of 5069 in use.
  4. This leaves 14566 gisms available.
  5. On average, each gism blocks 4 other gisms.
  6. In effect, then, we can have about 2900 more gismu, depending on the exact details of assignments.
  • That's not correct. You can't count 4-letter rafsi, because 4-letter rafsi are allowed to be very similar to other 4-letter rafsi. The gismu they come from could differ by one small change in the 4-letter stem and by the final vowel.
    • If this is correct, then we can't have a distinct gismu for every culture, because the number of languages spoken around the world is 6000-7000, depending on how you count.
      • And Rosta:
        • But we can have an indistinct gismu for every culture.

Third analysis

  • skat:
    • I have written a program that by brute force actually counts the number of free gismu. I started by generating a list of all 96475 possible gismu forms. We can call 'em "candidate gismu" or "proto-gismu". I then went through the list of 1342 official gismu, one at a time. For each one, I deleted it from the list of proto-gismu. Then I deleted all the proto-gismu that differed only by the final vowel except for the "brodV" series (are there any other exceptions like this?). Then I deleted all the proto-gismu that were blocked based on consonant similarity as in the table given just above. When I had done that for all the official gismu I counted the remaining proto-gismu.
    • The answer was rather surprising. I find that there are still 85,536 available gismu forms. Now, I'm sure you'll say, "That's simply not possible!" But think about it. A lot of the forms that would be blocked by the consonant similarity rules aren't even valid proto-gismu. (Don't forget, to be a valid proto-gismu a form still has to conform to certain rules.) So fewer forms are blocked than you might think. Additionally, a number of forms (no, I haven't counted them; maybe later) are blocked multiple ways. For instance, *bajru is blocked by both bacru and bajra. Because of this overlap fewer forms are blocked than you might think.
    • I'm pretty confident that my program is correct, but it would be a Good Thing if someone were to attempt to verify my results.
      • xorxes
        • You have to consider that those 85,536 are not independently available. Choosing one will block many of the others. Close to 80% of them will be blocked by the last vowel rule.
          • And Rosta:
            • But experimental gismu won't block each other (or at least, we can't say which blocks which). Furthermore, the blocking rules were part of the algorithm for assigning the original gismu forms and are not part of the living morphological rules that constrain, say, fu'ivla and cmevla, since the gismu are supposed to form a closed class. Since experimental gismu are inherently unofficial, there is no need to suppose that the blocking rules would apply to them. I accept that one would expect official gismu to conform to the blocking rules.
            • xorxes
              • Right, but in that case the relevant number is 96475 - 1342 = 95133 available forms.

Analysis by la gleki

  • la gleki:
    • I can count around 17895 possible experimental gismu in addition to 1392 official gismu

Questions