ESPnet version 202209
What's Changed
- Add dynamic mixing in the speech separation task. by @LiChenda in #4387
- Added test script and usage for calculate_rtf.py script to ESPnet2 tutorial page by @espnetUser in #4560
- Offline/Online (standalone) ESPnet2 Transducer by @b-flo in #4479
- Unfix matplotlib version by @kamo-naoyuki in #4576
- use torch.finfo for dtype other than float by @wenzhe-nrv in #4584
- Update recipe for slurp-entity by @ftshijt in #4585
- Egs2 aesrc by @brianyan918 in #4592
- update checks for bias in initialization by @LiChenda in #4574
- [WIP] Update to fit the recent update in s3prl. by @simpleoier in #4593
- Unfix numpy version by @kamo-naoyuki in #4598
- Update to fit the recent update in s3prl. by @simpleoier in #4600
- Add improved results on FLEURS dataset by @wanchichen in #4596
- Update mp4_to_wav.sh by @jaehyun-ko in #4605
- Pass output_dir as str to wandb.init() by @jonghwanhyeon in #4607
- Support enh_s2t joint training on multi-speaker data by @Emrys365 in #4566
- Add ASR results for commonvoice zh_TW by @slSeanWU in #4612
- Fix both utt2sid and utt2lid when removing long/short data by @jonghwanhyeon in #4609
- recipe config update by @ftshijt in #4621
- Add pytorch=1.12.1 to CI configurations by @kamo-naoyuki in #4604
- New SLU task by @siddhu001 in #4569
- Joss paper: Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing by @neillu23 in #4620
- Update conformer result of AMI corpus by @teinhonglo in #4629
- Offline/Online Branchformer Transducer by @b-flo in #4582
- Change to install numba using pip instead of conda by @kamo-naoyuki in #4637
- Add MixIT support. It is unsupervised only. Semi-supervised config is not available for now. by @simpleoier in #4619
- Add 2-pass SLU code for FSC Challenge by @siddhu001 in #4636
- CI fix and some other minor recipe fixes by @ftshijt in #4656
- Update the title of plots to be y-label vs x-label by @pyf98 in #4647
- Update VIVOS download link by @hieuthi in #4644
- Add ASR recipe of MAGICDATA mandarin read speech by @tjysdsg in #4635
- Amend to CI fix by @ftshijt in #4663
- qasr update by @massabaali7 in #4642
- Open_li110 for large-scale multilingual speech by @ftshijt in #4408
- Fix the path of calculate_rft.py by @sw005320 in #4660
- Fix importlib-metadata version by @kan-bayashi in #4686
- Cmu arctic tts pretrain finetune by @soumimaiti in #4456
- updated version to 202209 by @kan-bayashi in #4685
New Contributors
- @wenzhe-nrv made their first contribution in #4584
- @jaehyun-ko made their first contribution in #4605
- @jonghwanhyeon made their first contribution in #4607
- @slSeanWU made their first contribution in #4612
- @massabaali7 made their first contribution in #4642
- @soumimaiti made their first contribution in #4456
Full Changelog: v.202207...v.202209