Skip to content

How to port the Kaldi recipe to the ESPnet recipe?

Shinji Watanabe edited this page Feb 1, 2019 · 3 revisions

ESPnet fully follows the basic Kaldi data structure and we can easily port the Kaldi ASR recipe to the ESPnet one. Here, I'm using aishell as an example for the porting.

  1. Make sure to check whether your Kaldi recipe is correctly working.

    $ cd <your_kaldi_directory>/egs/aishell/s5
    $ ./run.sh
    

    Please make sure that it's working on at least in the data prepration stage, i.e., local/aishell_data_prep.sh.

  2. Find an appropriate ESPnet recipe. We may need several considerations, e.g.,

    • whether the recipe includes a data download script or not
    • narrow band (8kHz) vs. wide band (16kHz).
    • whether the target language has a word boundary or not.
    • whether we create an RNNLM or not.
    • check non-linguistic symbols (e.g., <NOISE>, which depends on the corpus)

    In the aishell example, we would start to modify the HKUST recipe by considering the above items.

  3. Create a directory and copy the basic files

    $ mkdir -p <your_espnet_directory>/egs/aishell/asr1
    $ cd <your_espnet_directory>/egs/aishell/asr1
    $ cp -r ../../hkust/asr1/{cmd.sh,conf,path.sh,run.sh,steps,utils} .
    
  4. Copy the data preparation scripts from the Kaldi aishell local directory to the ESPnet one.

    $ mkdir local
    $ cp <your_kaldi_directory>/egs/aishell/s5/local/{download_and_untar.sh,aishell_data_prep.sh} local/
    

    Please copy necessary scripts only. The Kaldi example local directory often includes other scripts (e.g., LM construction related), but we only need the data preparation scripts.

  5. Modify run.sh accordingly.

    • modify the data directory
      # data
      hkust1=/export/corpora/LDC/LDC2005S15/
      hkust2=/export/corpora/LDC/LDC2005T32/
      
      ->
      # data
      data=/export/a05/xna/data
      data_url=www.openslr.org/resources/33
      
    • modify the training, dev, and evaluation partitions with appropriate naming
      train_set=train_nodup_sp
      train_dev=train_dev
      recog_set="train_dev dev"
      
      ->
      train_set=train_sp
      train_dev=dev
      recog_set="dev test"
      
      and remove lines related to train_dev or train_nodup parts.
    • add the data download script
      if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
          echo "stage -1: Data Download"
          local/download_and_untar.sh ${data} ${data_url} data_aishell
          local/download_and_untar.sh ${data} ${data_url} resource_aishell
      fi
      
    • modify the data preparation script
          local/hkust_data_prep.sh ${hkust1} ${hkust2}
          local/hkust_format_data.sh
      
      ->
          local/aishell_data_prep.sh ${data}/data_aishell/wav ${data}/data_aishell/transcript
      
    • remove the following narrow band related parts (sox upsampling)
          # upsample audio from 8k to 16k to make a recipe consistent with others
          for x in train dev; do
              sed -i.bak -e "s/$/ sox -R -t wav - -t wav - rate 16000 dither | /" data/${x}/wav.scp
          done
      
    • remove non-linguistic symbol related, which is not used in the aishell corpus, e.g., removing the following lines:
      nlsyms=data/lang_1char/non_lang_syms.txt
      
      echo "make a non-linguistic symbol list"
      cut -f 2- data/${train_set}/text | grep -o -P '\[.*?\]' | sort | uniq > ${nlsyms}
      cat ${nlsyms}
      
      and removing -l ${nlsyms} in everywhere.