Releases: espnet/espnet
ESPnet Version 0.4.2
Bugfix
- [Bugfix] Fix pytorch LM GPU training without cupy #981
- [Bugfix] make tensorboard logging done every 100 iters #966
- [Bugfix] FiX ER calculator #955
- [Bugfix] Fix a typo bug in computing guided attention loss #956
- [Bugfix] run.sh should exit if sourcing path.sh return error #954
Recipe
- [Recipe] Update Librispeech recipe #970
- [Recipe] New RNN and Transformer result of AMI recipe(ihm) #978
- [Recipe] BPE support for SwitchBoard & Transformer config #909
- [Recipe] Update li10 #965
- [Recipe] Update libri trans #949
Enhancement
- [Enhancement] transform: expose pad_mode for logmelspectrogram #957
Acknowledgements
Special thanks to @Fhrozen, @geekboood, @hirofumi0810, @Jzmo, @naxingyu, @r9y9, @ShigekiKarita.
ESPnet Version 0.4.1
Bugfix
- [Bugfix] Fix a bug in calculate_all_attentions #862
- [Bugfix] Fix bugs in frontend #875
- [Bugfix] Fix grad noise v2 #912
- [Bugfix] Fix plot fail #913
- [Bugfix] Fix tgz typo #892
- [Bugfix] Fix: Output dimension of Conv2dSubsampling #822 #921
- [Bugfix] Fix: espnet/transform/transformation.py #866
- [Bugfix] Fixed certain typos #893
- [Bugfix] Modified if conditions #908
- [Bugfix] fix bugs in grad noise #886
- [Bugfix] CER/WER & CER_CTC in Transformer pytorch #936
- [Bugfix] Update iwslt18 recipe #808
Documentation
- [Documentation] Add model link #899
- [Documentation] Document espnet tools and modules #884
- [Documentation] Fix typo #930
- [Documentation] Reformat docstrings in espnet/asr #914
- [Documentation] Update CONTRIBUTING.md #880
- [Documentation] add recipe related documentations to CONTRIBUTING.md #872
- [Documentation] skip ci when gh-pages is deployed #901
- [Documentation] use only conda to build doc #895
Enhancement
- [Enhancement] Script for docker builds from the local repo #877
- [Enhancement] Demo script for TTS #871
- [Enhancement] Fix plot attention for chainer transformer #940
- [Enhancement] Implement Fast Speech #848
- [Enhancement] Move the dependency links to github from Makefile to setup.py #858
- [Enhancement] Support new version in Docker containers #836
- [Enhancement] gradient noise injection from std normal dis #881
- [Enhancement] [Discussion] Create show_result.sh #874
Recipe
- [Recipe] Add Jsut asr recipe #793
- [Recipe] AURORA4 RESULTS.md file #835
- [Recipe] Add Librispeech French corpus #882
- [Recipe] Add transformer config in m_ailabs/tts1 recipe #924
- [Recipe] Change librispeech_french to libri_trans #903
- [Recipe] Fix: utils/show_result.sh #915
- [Recipe] Minor update for speech translation recipe #907
- [Recipe] Transformer for CHiME4 Single Channel #837
- [Recipe] Update LJSpeech RESULTS.md #861
- [Recipe] Update LJSpeech RESULTS.md #887
- [Recipe] Update Librispeech recipe #885
- [Recipe] Update fisher callhome spanish for speech translation #868
- [Recipe] libri_trans NMT recipe #931
Refactoring
- [Refactoring] Refactor TTS Transformer #865
- [Refacotring] test: avoid using grep and sed in subprocess and use python stdlib instead #854
- [Refactoring] Update TTS module’s docstrings and refactor some modules #898
Acknowledgements
Special thanks to @27jiangziyan, @Fhrozen, @Masao-Someki, @ShigekiKarita, @SuperGops7, @creatorscan, @hirofumi0810, @kamo-naoyuki, @lumaku, @naxingyu, @r9y9, @simpleoier, @takenori-y.
ESPnet Version 0.4.0
New features and improvements
- E2E Mulchi channels system #596
- Changed to use pip-install for pytorch_wpe #843
- Transformer
- Specaugment #734 #745 #754
- Streaming attention encoder-decdoer E2E-ASR #757
- Offline recognition demo #809
- New batch making strategies #759
- Guided Attention Loss #816
Important changes
- drop python2 support
- use
utils/fix_data_dir.sh
as default #660 - CPU-only installation #677 #687 #704
- fix to use python2 as default in travis #685
- add CUDA_VERSION in Makefile #687
- use Pytorch 1.0.1 as default #721
- use
yaml
format configuration file #722 - modularize TTS components #746 #815
- use Chainer/Cupy 6.0.0 as default #753
- reinforce CI #763
- Google drive downloader #798
- New scripts to pack model and get system info #790 #802
- change the scoring in multi-speaker case from shell to python #805
- update patience in TTS recipes #817
n_average
option in TTS #823- update TTS recipes to use config files #780
- make
ngpu=1
as default for all of the recipes #800 - deprecate
egs/librispeech/tts1
recipe #806 - maintain the pytorch warp-ctc under espnet #838
New recipes
- AURORA4 #722 #770 #824
- JNAS #725
- LibriTTS #795
- Tedlium release3 #739
- added the model link and missing files #831
- TIMIT #698
- Russian Open STT #768
Recipe updates
- Aishell
- CSJ
- HKUST
- support Transformer #840
- IWSLT18
- add missing files for iwslt18 recipe #767
- Librispeech
- support Transformer #781
- LJSpeech
- Tedlium release2
- Voxforge
- WSJ
Documentation
- add citation bibtex entry for ESPnet #676
- add NACCL paper repliation link for CMU Wilderness Multilingual Speech Dataset #717 #731
- update library information #789
- Add table of contents #812
- add GPU decoding document Documentation #813
- minibatch explanation #821
Bugfix
- fix recognize_batch for 2d, location_reccurent, multi-head attentions for #665 and add test #681
- fix CER/WER calculation during training #678
- add version check for matplotlib installation #679
- make sure
hlens
is tensor in recognize_batch #680 - fix choice between pytorch and pytorch-cpu #702
- fix
merge_json
behavior (#699) when no labels for #708 - fix
check_install.py
#728 - use
ensure_ascii=False
to make json human-readable #730 - Fix argument name for SummaryWriter #747
- use scikit-learn 0.20 #749
- fix pytorch for chainer v6.0.0 #772
- fix model compatibility #799
- fix minor typos in the recipes #801
- bug fix:
egs/chime4/asr1_multich/conf/train.yaml
#826 - bug fix:
espnet/utils/training/batchfy.py
#833 - fix to use sentencepiece v.0.1.82 #839
Acknowledegements
Special thanks to @27jiangziyan, @akreal, @bobchennan, @creatorscan, @danoneata, @Fhrozen, @gtache, @hirofumi0810, @jan-schuchardt, @jnishi, @kamo-naoyuki, @Masao-Someki, @oadams, @simpleoier, @sknadig, @ShigekiKarita, @takenori-y
ESPnet Version 0.3.1 (stable)
New improvements
- Add instant speech recognition #581
- Add CTC greedy decoding CER monitor #587
- Add Streaming encoder #638
- Add Uni-directional encoder #624 #629
- Add model compatibility test #615 #649
- Update fisher_callhome_spanish recipe #625
- Improve swbd scoring #614 #620
- Improve memory usage in json merge script #579
- Improve background job failure check in decoding state #627 #643 #648
- Separate installation of basic tools and extra tools #628
Bugfix
- Fix CTC type selection #617 #618
- Fix MultiProcessIterator #613
- Fix chainer sortgrad bug
- Fix installer #594 #595 #604 #609 #622
- Fix WSJ-mix recipe #610 #630 #641
- Fix remove_longshortdata.sh #646
Thank you for a lot of contributions @kamo-naoyuki, @gtache, @simpleoier, @takenori-y, @Fhrozen, @JaejinCho, @pzelasko, @zh794390558, @kan-bayashi, @sw005320.
ESPnet v.0.3.0 beta
New features and improvements
- Support Pytorch 1.0 #553
- Support the use of Tensorboard #506
- Support early stopping #508
- Support
stop_stage
option #539 - Support sortgrad #550
- Add GRU architecture #496
- Add GPU batch decoding #318
- Support HDF5 format instead of kaldi ark #412 #493
- Add speech separation recipe #531
- Add TTS recipes (German, Spanish, Italy, Japanese...) #562 #569 #519
- Add ASR recipes #574 #519
- Improve ASR recipes #491 #521 #546 #435 #467 #469
- Improve speech translation recipes #468
- Improve Python2/3 compatibility #567
- Improve cmd.sh usage #538 #547
- Add test scripts for shell scripts #484 #498
- Change to use conda with Python3.7 as default #567
- Python code modularization #440 #484
We really appreciate a lot of contributions, @gtache, @kamo-naoyuki, @hirofumi0810, @ShigekiKarita, @takenori-y, @simpleoier, @Fhrozen, @sas91, @mn5k, @JaejinCho. @Xiaofei-Wang, @jnishi, @Magic-Bubble.
ESPnet v.0.2.0 (Major update)
New feature and improvement
- add data prefetch #340
- add new recipes
- add test codes
- add check script for python library installation #373 #389
- improve some ASR baseline recipes by using a shallow and wide BLSTM encoder and subwords
Important changes
- fix to use PyTorch 0.4.1 (stop to support PyTorch 0.3.x) #332
- rename some functions
e2e_asr_attctc.py
->e2e_asr.py
e2e_asr_attctc_th.py
->e2e_asr_th.py
- change the format of model.conf from pickle to JSON #342
- remove deprecated options #336
- unify the data converter with TTS one #343
- unify model variable arguments between TTS and ASR #337
- fix pytorch backend snapshot functions including the save of optimizers #362
- avoid to use
feat-to-len
. Usewrite_utt2num_frames=true
, and read utt2num instead of executingfeat-to-len
#339 - refacor
asr_pytorch.py
andasr_chainer.py
.- refactor the recog part in asr_chainer.py and asr_pytorch especially after it gets nbest. #370
- make
nets/e2e_common.py
, and move some common functions there
Bug fix
ESPnet v.0.1.5 (minor update)
- update the Librispeech ASR recipe and use subword modeling as default.
- attached Librispeech ASR model (librispeech_asr1.tgz):
- RNNLM:
exp/train_rnnlm_2layer_bs256_unigram2000/rnnlm.model.best
- ASR models:
exp/train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_unigram2000/results/{model.acc.best,model.conf}
- performance:
- RNNLM:
WER (%) | |
---|---|
Librispeech dev_clean | 5.0 |
Librispeech test_clean | 5.0 |
-
- when we use the above models, please insert the ASR model directory (
expdir
) and RNNLM model directory (lmexpdir
) inrun.sh
as follows:
- when we use the above models, please insert the ASR model directory (
expdir=exp/train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_unigram2000
lmexpdir=exp/train_rnnlm_2layer_bs256_unigram2000
${decode_cmd} JOB=1:${nj} ${expdir}/${decode_dir}/log/decode.JOB.log \
asr_recog.py \
--ngpu ${ngpu} \
--backend ${backend} \
--recog-json ${feat_recog_dir}/split${nj}utt/data_${bpemode}${nbpe}.JOB.json \
--result-label ${expdir}/${decode_dir}/data.JOB.json \
--model ${expdir}/results/model.${recog_model} \
--model-conf ${expdir}/results/model.conf \
--beam-size ${beam_size} \
--penalty ${penalty} \
--maxlenratio ${maxlenratio} \
--minlenratio ${minlenratio} \
--ctc-weight ${ctc_weight} \
--rnnlm ${lmexpdir}/rnnlm.model.best \
--lm-weight ${lm_weight} \
ESPnet v.0.1.4
- Added TTS recipe based on Tacotron2
egs/ljspeech/tts1
- Extended the above TTS recipe to multispeaker TTS
egs/librispeech/tts1/
- Supported PyTorch 0.4.0
- Added word level decoding
- (Finally) fixed CNN (VGG) layer issues in PyTorch
- Fixed warp CTC scaling issues in PyTorch
- Added subword modeling based on sentence piece toolkit
- Many bug fix
- Updated CSJ performance
stable version for jsalt18 summer school
- bug fix
- improve the jsalt18e2e recipe
- improve the JSON format
- simplify Makefile
Change JSON format and use feature compression
- change the JSON format to deal with multiple inputs and outputs
- use feature compression to reduce the data I/O