-
Notifications
You must be signed in to change notification settings - Fork 2.1k
How to port the Kaldi recipe to the ESPnet recipe?
ESPnet fully follows the basic Kaldi data structure and we can easily port the Kaldi ASR recipe to the ESPnet one. Here, I'm using aishell as an example for the porting.
-
Make sure to check whether your Kaldi recipe is correctly working.
$ cd <your_kaldi_directory>/egs/aishell/s5 $ ./run.sh
Please make sure that it's working on at least in the data prepration stage, i.e.,
local/aishell_data_prep.sh
. -
Find an appropriate ESPnet recipe. We may need several considerations, e.g.,
- whether the recipe includes a data download script or not
- narrow band (8kHz) vs. wide band (16kHz).
- whether the target language has a word boundary or not.
- whether we create an RNNLM or not.
- check non-linguistic symbols (e.g.,
<NOISE>
, which depends on the corpus)
In the aishell example, we would start to modify the HKUST recipe by considering the above items.
-
Create a directory and copy the basic files
$ mkdir -p <your_espnet_directory>/egs/aishell/asr1 $ cd <your_espnet_directory>/egs/aishell/asr1 $ cp -r ../../hkust/asr1/{cmd.sh,conf,path.sh,run.sh,steps,utils} .
-
Copy the data preparation scripts from the Kaldi aishell local directory to the ESPnet one.
$ mkdir local $ cp <your_kaldi_directory>/egs/aishell/s5/local/{download_and_untar.sh,aishell_data_prep.sh} local/
Please copy necessary scripts only. The Kaldi example local directory often includes other scripts (e.g., LM construction related), but we only need the data preparation scripts.
-
Modify
run.sh
accordingly.- modify the data directory
->
# data hkust1=/export/corpora/LDC/LDC2005S15/ hkust2=/export/corpora/LDC/LDC2005T32/
# data data=/export/a05/xna/data data_url=www.openslr.org/resources/33
- modify the training, dev, and evaluation partitions with appropriate naming
->
train_set=train_nodup_sp train_dev=train_dev recog_set="train_dev dev"
and remove lines related totrain_set=train_sp train_dev=dev recog_set="dev test"
train_dev
ortrain_nodup
parts. - add the data download script
if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then echo "stage -1: Data Download" local/download_and_untar.sh ${data} ${data_url} data_aishell local/download_and_untar.sh ${data} ${data_url} resource_aishell fi
- modify the data preparation script
->
local/hkust_data_prep.sh ${hkust1} ${hkust2} local/hkust_format_data.sh
local/aishell_data_prep.sh ${data}/data_aishell/wav ${data}/data_aishell/transcript
- remove the following narrow band related parts (sox upsampling)
# upsample audio from 8k to 16k to make a recipe consistent with others for x in train dev; do sed -i.bak -e "s/$/ sox -R -t wav - -t wav - rate 16000 dither | /" data/${x}/wav.scp done
- remove non-linguistic symbol related, which is not used in the aishell corpus, e.g., removing the following lines:
and removing
nlsyms=data/lang_1char/non_lang_syms.txt echo "make a non-linguistic symbol list" cut -f 2- data/${train_set}/text | grep -o -P '\[.*?\]' | sort | uniq > ${nlsyms} cat ${nlsyms}
-l ${nlsyms}
in everywhere.
- modify the data directory