Skip to content

YUCHEN005/DPSL-ASR

Repository files navigation

DPSL-ASR (Dual-Path Style Learning)

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

Introduction

DPSL-ASR is a novel method for end-to-end noise-robust speech recognition. It extends our prior work IFF-Net (Interactive Feature Fusion Network) with dual-path inputs and style learning, which achieves better ASR performance on Robust Automatic Transcription of Speech (RATS) and CHiME-4 datasets.

Left figure: (a) joint SE-ASR approach, (b) IFF-Net baseline, (c) our proposed DPSL-ASR approach.

Right figure: back-end ASR module with style learning and consistency loss in our DPSL-ASR. The dashed arrows denote sharing parameters.

If you find DPSL-ASR or IFF-Net useful in your research, please kindly use the following BibTeX entry for citation:

@inproceedings{hu2023dual,
  title={Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition}, 
  author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
  booktitle={INTERSPEECH},
  year={2023}
}

@inproceedings{hu2022interactive,
  title={Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition},
  author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
  booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6292--6296},
  year={2022},
  organization={IEEE}
}

Usage

Our code implementation is based on ESPnet (v.0.9.6), please kindly use the following commands for installation.

git clone https://github.com/YUCHEN005/DPSL-ASR.git
cd DPSL-ASR
pip install -e .

Experiment directory is at egs2/rats_chA/asr_with_enhancement/, and the network code is at espnet2/asr/dpsl_asr.py.

About

Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published