Skip to content

ESPnet extensions for semi-supervised end-to-end speech recognition. See also https://github.com/ShigekiKarita/espnet-semi-supervised/tree/karita-asrtts for newer code in ICASSP2019 Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders

License

Notifications You must be signed in to change notification settings

ShigekiKarita/espnet-semi-supervised

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ESPnet extensions for semi-supervised end-to-end speech recognition

This repository contains evaluation scripts used in our paper

@inproceedings{Karita2018,
  author={Shigeki Karita and Shinji Watanabe and Tomoharu Iwata and Atsunori Ogawa and Marc Delcroix},
  title={Semi-Supervised End-to-End Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2--6},
  doi={10.21437/Interspeech.2018-1746},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1746}
}

Full PDF is available in https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1746.html.

how to setup

$ git clone https://github.com/nttcslab-sp/espnet-semi-supervised --recursive
$ cd espnet-semi-supervised/espnet/tools; make PYTHON_VERSION=3 -f conda.mk
$ cd ../..
$ ./run.sh --gpu 0 --wsj0 <your-wsj0-path> --wsj1 <your-wsj1-path>

NOTE: you need to install pytorch 0.3.1.

scripts

in root dir

  • run.sh : end-to-end recipe for this experiment (do not forget to set –gpu 0 if you have that)
  • sbatch.sh : slurm job script for sevaral pair/unpair data ratio and hyper parameter search (requires finished run_retrain_wsj.sh expdir for pretrained model params)

in shell/ dir

  • show_results.sh : summarize CER/WER/SER from decoded results of dev93/test92 sets (usage: `show_results.sh exp/train_si84_xxx`)
  • decode.sh : a script for decode and evaluate training model (usage: `decode.sh –expdir exp/train_si84_xxx`)
  • debug.sh : we recommend to source debug.sh before using ipython to set path to everything you need

in python/ dir

  • asr_train_loop_th.py : is a python script for initial-training with the paired dataset (train_si84)
  • retrain_loop_th.py : is a python script for re-training with the unpaired dataset (train_si284)
  • unsupervised_recog_th.py : is a python script for decoding by the re-trained model
  • unsupervised.py : implements pytorch model for paired/unpaired learning
  • results.py : implements chainer like reporter without chainer iterator used in training loop

results

train_setdev93 Accdev93 CEReval92 CERdev93 WEReval92 WERdev93 SEReval92 SERpath
train_si84 (7138, 15 hours)77.625.415.861.944.299.898.5exp/train_si84_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150
+ train_si284 RNNLM19.316.651.347.799.899.7exp/rnnlm_train_si84_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_epochs15
+ unpaired train_si284 retrain83.828.215.661.240.599.697.6./exp/train_si84_retrain_None_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9
+ RNNLM22.117.251.644.299.099.4./exp/train_si84_retrain_None_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9/rnnlm0.1
+ unpaired train_si284 retrain w/ GAN-si8483.526.315.059.940.099.497.3exp/train_si84_paired_hidden_gan_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.5_train_si84_epochs15
+ unpaired train_si284 retrain w/ KL-si8483.628.515.660.540.499.697.3exp/train_si84_paired_hidden_gausslogdet_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_epochs15
+ unpaired train_si284 retrain w/ GAN84.222.117.950.944.299.299.4./exp/train_si84_retrain84_gan_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_iter5
+ RNNLM22.117.950.944.299.299.4./exp/train_si84_retrain84_gan_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_iter5/rnnlm0.2
+ unpaired train_si284 retrain w/ KL84.024.814.458.139.599.696.4./exp/train_si84_ret3_gausslogdet_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.5_train_si84_epochs30
+ RNNLM20.016.948.942.799.099.1./exp/train_si84_retrain84_gausslogdet_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.99_st0.99_train_si84/rnnlm0.2
+ unpaired train_si284 retrain w/ MMD82.925.913.959.738.499.296.7./exp/train_si84_ret3_mmd_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.5_st0.99_train_si84_epochs30
train_si284 (37416 utt, 81 hours)93.98.16.323.818.992.487.4exp/train_si284_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150
+ train_si284 RNNLM7.96.122.718.389.784.1./exp/rnnlm_train_si284_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_epochs15
  • Acc: character accuracy during training with forced decoding
  • CER: character error rate (edit distance based error)
  • WER: word error rate (edit distance based error)
  • SER: sentence error rate (exact match error)
  • all the exp path starts with exp/... is placed to /nfs/kswork/kishin/karita/experiments/espnet-unspervised/egs/wsj/unsupervised on NTT ks-servers

smaller paired train data results

plot.png

contact

email: karita.shigeki@lab.ntt.co.jp

About

ESPnet extensions for semi-supervised end-to-end speech recognition. See also https://github.com/ShigekiKarita/espnet-semi-supervised/tree/karita-asrtts for newer code in ICASSP2019 Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published