lojban diphone speech synthesizer: Difference between revisions

From Lojban
Jump to navigation Jump to search
mNo edit summary
 
mNo edit summary
Line 1: Line 1:


<pre>
Contact [[jbocre: Xavier|Xavier]] if you'd like to contribute on this project.


ni'o le bi'u cmalu bloti
We're making progress! For now, you can check out its first words here:


ni'o le bi'u cmalu bloti
[http://staticfree.info/projects/lojban_festvox/mi_nelci_la_lojban.ogg]


ze'epu na-na-naku xe klama
Other TTS samples:


ze'epu na-na-naku xe klama
{file name=lnc-tts.ogg showdesc=1}


.a'oi .a'oi
{file name=kalifornias.ogg showdesc=1}. Contains at least one stress error: cabycte -> cabYcte


''to'i ro le pemjufygri cu rapselsanga tai di'u toi''
{file name=since_masno.ogg showdesc=1}


.i ko'a co'a ze'u litru
Now, the rest of this probably only makes sense to those of you who are familiar with phonetics. I apologize in advance for the technical jargon.


la tumly'''sel'''sruri xamsi
What we need to do now is to listen through the corpus, and decide where the diphone boundaries go. We also have to find the "middle" of the diphones. I don't know what the TTS system expects, but a preliminary rule of thumb that should at least yield '''consistent''', if not correct, results is to put the middle point in the boundaries between the phones. If there are two consecutive diphones, the part between the two middle marks should sound as one phone.


.a'oi .a'oi
Dipthongs (two vowels together, eg.: ai, oi, au) should be split where the sound changes. So when you see "a" turning to "i" split it right there.


.i jeftu be li mu bi'o xa
For plosives (diphones like "a-p" and "k-u" where there is a burst of air coming from the mouth), the diphone split should be done before the opening phase of the plosive. E.g., for two diphones, "a-p", and "p-a", half of the "a" and the silent part should end up in "a-p". The explosion and half of the next "a" should end up in "p-a".


.i lei cid'''ja''' cu se xaksu
See here for more information on diphone tagging conventions:


.a'oi .a'oi
[http://www-2.cs.cmu.edu/~awb/papers/festvox/festvox_5.html#SEC26]


.i le ka se citka cu funca
The format of the file is to be found at [http://www.cstr.ed.ac.uk/projects/festival/manual/festival_20.html#SEC80].


le tordu '''sas'''tu'u lacpu
A practical way of doing this is with Praat, [http://www.fon.hum.uva.nl/praat/]. Here is a short howto:


.a'oi .a'oi
# Praat objects window -> Read -> Read from file...


.i le ci'onrai cu se funca
# Select the file, and push Label and Segment -> To Textgrid...


.ija'e'''bo''' cuxna ko'e
# Tier names: Diphone Middle. Leave Point tiers blank. Click OK.


.a'oi .a'oi
# Select both the sound file and the TextGrid. Push Edit.


.i ko'a casnu le terjukpa
# Click anywhere in the waveform or spectrogram to move the cursor there. Click Boundary - Add on selected tier, tier 2, etc. You can always move the boundary later.


be le mal'''sel'''funca nanla
# Click between two boundaries to select it. You can play it, and you will see the location of the start and end points in seconds.


.a'oi .a'oi
# Do the same with the middles, but this time, click on the boundaries instead of between them. The exact location will be shown.


.i darlu le nu rasyjukpa
# To use Xavier's conversion script, simply label the segments with their diphone (a-t, #-d) in the text-box at the top of the edit window. Make sure you label the segment between the midpoints with the appropriate '''phoneme''', not diphone as it really shouldn't be one. See [http://staticfree.info/projects/lojban_festvox/praat_timing.png] for an example.


.i djica le nu frikasa
# Back in the main window, select all the TextGrids that you've been working on and Write -> Write to text file. '''Note:''' if you save one TextGrid at a time, make sure to retain the original filename; otherwise the script won't have the sample name at all. Praat doesn't put the sample name in singular TextGrid files for some reason.


.a'oi .a'oi
# finally do:


.i ca'o le nu casnu tai kei
./util/TextGrid2index.pl -l ljb_diphone/ljb_diphone_hand_timed.index \


ko'e parkla le zgabaktu
praat/praat.Collection > ljb_diphone/ljb_diphone.index


.a'oi .a'oi
(with the appropriate filenames, of course)


.i ko'e cpedu fi le cevni
See the documentation included in the distribution for more instructions on using Praat to time diphones.


.i te preti fo le banli
* {file name=ljbdiph.list showdesc=1}
* {file name=ljb_schema_pseudocode.doc showdesc=1}


.a'oi .a'oi
* {file name=toi_ljb_phones.scm showdesc=1}
* {file name=akwavs.zip showdesc=1}


.iku'i ko'e ru'u zgana
* {file name=akflacs.zip showdesc=1}
 
lo ve'u vi'a bonxamsi
 
.a'oi .a'oi
 
.i le malselfunca cu kixsku
 
lu doi cesmamta je kurji
 
.a'oi .a'oi
 
mi zergau .inaja ko fraxu
 
.i ko citka fanta li'u
 
.a'oi .a'oi
 
.i ca lenu ko'e kixsku kei
 
da banli makfa le nanla
 
.a'oi .a'oi
 
.i lo cmalu finpe kilto cu
 
plipe le bloti le xamsi
 
.a'oi .a'oi
 
.i kavbu gi'e rasyjukpa
 
.i le nanla cu nurbi'o
 
.a'oi .a'oi
 
.i di'u zdile be do lisri
 
.inaja ke'urta'a do
 
.a'oi .a'oi
 
</pre>

Revision as of 17:02, 4 November 2013

Contact Xavier if you'd like to contribute on this project.

We're making progress! For now, you can check out its first words here:

[1]

Other TTS samples:

{file name=lnc-tts.ogg showdesc=1}

{file name=kalifornias.ogg showdesc=1}. Contains at least one stress error: cabycte -> cabYcte

{file name=since_masno.ogg showdesc=1}

Now, the rest of this probably only makes sense to those of you who are familiar with phonetics. I apologize in advance for the technical jargon.

What we need to do now is to listen through the corpus, and decide where the diphone boundaries go. We also have to find the "middle" of the diphones. I don't know what the TTS system expects, but a preliminary rule of thumb that should at least yield consistent, if not correct, results is to put the middle point in the boundaries between the phones. If there are two consecutive diphones, the part between the two middle marks should sound as one phone.

Dipthongs (two vowels together, eg.: ai, oi, au) should be split where the sound changes. So when you see "a" turning to "i" split it right there.

For plosives (diphones like "a-p" and "k-u" where there is a burst of air coming from the mouth), the diphone split should be done before the opening phase of the plosive. E.g., for two diphones, "a-p", and "p-a", half of the "a" and the silent part should end up in "a-p". The explosion and half of the next "a" should end up in "p-a".

See here for more information on diphone tagging conventions:

[2]

The format of the file is to be found at [3].

A practical way of doing this is with Praat, [4]. Here is a short howto:

  1. Praat objects window -> Read -> Read from file...
  1. Select the file, and push Label and Segment -> To Textgrid...
  1. Tier names: Diphone Middle. Leave Point tiers blank. Click OK.
  1. Select both the sound file and the TextGrid. Push Edit.
  1. Click anywhere in the waveform or spectrogram to move the cursor there. Click Boundary - Add on selected tier, tier 2, etc. You can always move the boundary later.
  1. Click between two boundaries to select it. You can play it, and you will see the location of the start and end points in seconds.
  1. Do the same with the middles, but this time, click on the boundaries instead of between them. The exact location will be shown.
  1. To use Xavier's conversion script, simply label the segments with their diphone (a-t, #-d) in the text-box at the top of the edit window. Make sure you label the segment between the midpoints with the appropriate phoneme, not diphone as it really shouldn't be one. See [5] for an example.
  1. Back in the main window, select all the TextGrids that you've been working on and Write -> Write to text file. Note: if you save one TextGrid at a time, make sure to retain the original filename; otherwise the script won't have the sample name at all. Praat doesn't put the sample name in singular TextGrid files for some reason.
  1. finally do:

./util/TextGrid2index.pl -l ljb_diphone/ljb_diphone_hand_timed.index \

praat/praat.Collection > ljb_diphone/ljb_diphone.index

(with the appropriate filenames, of course)

See the documentation included in the distribution for more instructions on using Praat to time diphones.

  • {file name=ljbdiph.list showdesc=1}
  • {file name=ljb_schema_pseudocode.doc showdesc=1}
  • {file name=toi_ljb_phones.scm showdesc=1}
  • {file name=akwavs.zip showdesc=1}
  • {file name=akflacs.zip showdesc=1}