How to port the Kaldi recipe to the ESPnet recipe?

ESPnet fully follows the basic Kaldi data structure and we can easily port the Kaldi ASR recipe to the ESPnet one. Here, I'm using aishell as an example for the porting.

Make sure to check whether your Kaldi recipe is correctly working.
```
$ cd <your_kaldi_directory>/egs/aishell/s5
$ ./run.sh
```
Please make sure that it's working on at least in the data prepration stage, i.e., local/aishell_data_prep.sh.
Find an appropriate ESPnet recipe. We may need several considerations, e.g.,
- whether the recipe includes a data download script or not
- narrow band (8kHz) vs. wide band (16kHz).
- whether the target language has a word boundary or not.
- whether we create an RNNLM or not.
- check non-linguistic symbols (e.g., <NOISE>, which depends on the corpus)
In the aishell example, we would start to modify the HKUST recipe by considering the above items.

Create a directory and copy the basic files

$ mkdir -p <your_espnet_directory>/egs/aishell/asr1
$ cd <your_espnet_directory>/egs/aishell/asr1
$ cp -r ../../hkust/asr1/{cmd.sh,conf,path.sh,run.sh,steps,utils} .

Copy the data preparation scripts from the Kaldi aishell local directory to the ESPnet one.
```
$ mkdir local
$ cp <your_kaldi_directory>/egs/aishell/s5/local/{download_and_untar.sh,aishell_data_prep.sh} local/
```
Please copy necessary scripts only. The Kaldi example local directory often includes other scripts (e.g., LM construction related), but we only need the data preparation scripts.

Modify run.sh accordingly.

modify the data directory

# data
hkust1=/export/corpora/LDC/LDC2005S15/
hkust2=/export/corpora/LDC/LDC2005T32/

->

# data
data=/export/a05/xna/data
data_url=www.openslr.org/resources/33

modify the training, dev, and evaluation partitions with appropriate naming
```
train_set=train_nodup_sp
train_dev=train_dev
recog_set="train_dev dev"
```
->
```
train_set=train_sp
train_dev=dev
recog_set="dev test"
```
and remove lines related to train_dev or train_nodup parts.

add the data download script

if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
    echo "stage -1: Data Download"
    local/download_and_untar.sh ${data} ${data_url} data_aishell
    local/download_and_untar.sh ${data} ${data_url} resource_aishell
fi

modify the data preparation script

    local/hkust_data_prep.sh ${hkust1} ${hkust2}
    local/hkust_format_data.sh

->

    local/aishell_data_prep.sh ${data}/data_aishell/wav ${data}/data_aishell/transcript

remove the following narrow band related parts (sox upsampling)

    # upsample audio from 8k to 16k to make a recipe consistent with others
    for x in train dev; do
        sed -i.bak -e "s/$/ sox -R -t wav - -t wav - rate 16000 dither | /" data/${x}/wav.scp
    done

remove non-linguistic symbol related, which is not used in the aishell corpus, e.g., removing the following lines:

nlsyms=data/lang_1char/non_lang_syms.txt

echo "make a non-linguistic symbol list"
cut -f 2- data/${train_set}/text | grep -o -P '\[.*?\]' | sort | uniq > ${nlsyms}
cat ${nlsyms}

and removing -l ${nlsyms} in everywhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to port the Kaldi recipe to the ESPnet recipe?

Clone this wiki locally