Zipf's Law

From Lojban
Jump to: navigation, search

Zipf's Law (see for example [1] http://www.kornai.com/MatLing/statling.html) was formulated in the 1940's by Harvard linguistics professor George Kingsley Zipf (1902-1950) as an empirical generalisation, and states that the n-th most frequent word in a language shows up with frequency 1/n.

So the most frequent two words account for 150% of the language?

  • ... ignoring boundary cases, obviously.

Zipf made the further assumption that, the shorter a word is, the more common it is; this ties in to the more general empirical observation that 'smaller' events are commoner than 'larger' events. (http://www.parc.xerox.com/istl/groups/iea/papers/ranking/ranking.html for other laws expressing this.) This observation is also referred to loosely as 'Zipf's Law', but is not what people outside linguistics understand by it.


However, this is only a generalization; & every language has common polysyllabic terms, because

they are useful. It doesn't mean a long term is somehow "doomed". (And as Talen says, 'If you want

orlin, you know where to find it.')

  • How common do you mean? Just "everyday", or "extremely high frequency, top 250 words" sort of thing? If the former, then yes, but that is not really the key issue. If the latter, I will bother to check for English. --And Rosta