ASR_SemanticMask

The repo contains our code of ``Semantic Mask for Transformer based End-to-End Speech Recognition"

Preparation

We already build a runnable docker, you can run the following command to download and run the docker

docker run -it --volume-driver=nfs --shm-size=64G j4ckl1u/espnet-py36-img:latest /bin/bash

Regarding data preparation, I suggest you read ESPnet instructions. It should be note that espnet doesn't do speed perturbation, but I strongly recommend to do it according to the better performance on dev-other and test-other datasets.

Word Alignment

To enable semantic mask training, you have to align audio and word. In our work, we use the alignment results released by this repo, which is obtained using Montreal Forced Aligner. We put the extracted information on data directory. start.txt and end.txt record the alignment position in frame for each word in each utterance.

Training and decoding

For training, I upload my training configs into configs folder, including base setting and large setting respectively. Our archtecture is similar to ESPnet, but replacing position embedding with CNN in both encoder and decoder. The specific code change can be found at here

In terms of decoding, pleaes first download the ESPnet pre-trained RNN language model, and then run our decoding script to get the model output.

Pre-train Models

We release a base model (12 encoder layers and 6 decoder layers) and a large model (24 encoder layers and 12 decoder layers). It achevies following results with shallow language model fusion setting.

	dev-clean	dev-other	test-clean	test-other
Base	2.07	5.06	2.31	5.21
Large	2.02	4.91	2.19	5.19

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
configs		configs
data		data
espnet		espnet
script		script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

configs

configs

data

data

espnet

espnet

script

script

README.md

README.md

Repository files navigation

ASR_SemanticMask

Preparation

Word Alignment

Training and decoding

Pre-train Models

About

Releases

Packages

Contributors 2

Languages

MarkWuNLP/SemanticMask

Folders and files

Latest commit

History

Repository files navigation

ASR_SemanticMask

Preparation

Word Alignment

Training and decoding

Pre-train Models

About

Resources

Stars

Watchers

Forks

Languages