speech recognition

From Lojban
Jump to navigation Jump to search

The first step of a general speech recognition system (and the kind one would want for Lojban), consists of phoneme recognition. There are two ways of doing it (roughly?):

  1. Some kludgy looking algorithmic process for identifying phonemes based on voice features. (Bebe (sorry, URL tomorrow) does this)
    • This requires you have either a very good description of the distinguishing features of each relevent phoneme, or a bunch of bunch of phoneme recordings to work from.
  1. Some kludgy method for comparing extracted voice features to the voice features of a bunch of phoneme recordings on record.
    • This requires a bunch of phoneme recordings to work from.
  1. Other?
    • This requires that you replace "Other?" with a description. :)

So, does anyone happen to have any idea where to get a lot of phoneme recordings, with the phoneme boundries convinently premarked? Or descriptions of algorithms for identifying phonemes (without recordings to match again)? The Lojban speech synthesis data for festvox/festival would be nice, but when I looked earlier today, the URLs for it were broken, and the whole webserver seemed fairly mucked up.

  • Why doesn't someone organize a research project, by getting a bunch of lojbanists to pronounce every phoneme, and then analyzing the differences and similarities between them, until we can at least begin to nail down what makes these sounds different. -- mi'e bancus
    • because there are already huge corpora of sounds out there, which are already tagged so that you know exactly where every last phoneme in the corpus starts and stops. see the inguistic Data Consortium. problem is they only distribute on CD, and the CDs are usually $100 or so a pop. and i, at least, have zero interest in trying to tag hundreds of thousands of phonemes. i've found decent corpora floating about freely on the internet in the past, but they didn't have phoneme boundries tagged. --Jay
    • Just a piece of technicality about terminology (never a Loglan/Lojban strong point): you cant have a recording of phonemes as such, only of phones. A phoneme is a language-specific grouping of phones -- actually produced sounds -- that are taken as functioning as identicals in a given language. So what we want is either a way of dividing up phones into phonemes of Lojban or a way of generalizing from various phones taken in Lojban to belong to the same phoneme. >|8}