Skip to content

30stomercury/autoregressive-co-training

Repository files navigation

Autoregressive Co-training

The implementation of the paper:

Autoregressive Co-Training for Learning Discrete Speech Representations
Sung-Lin Yeh, Hao Tang

Dependencies

pip install -r requirements.txt

Models

The co-training model described in the paper is defined in cotraining.py. Different components of the model are modular and can be easily modified.

Data

Data are processed to Kaldi I/O form, which uses scp files to map utterance ids to positions in ark files. Functions used to process .scp and .ark files can be found under dataflow/. We provide a data sample in sample/ for users to run the pipeline. Users can simply pluge in your custom dataloader here.

Train

python3 train.py --config config/cotraining.yaml

Pre-trained Models

Hours Num codes Model dev93 (PER) eval92 (PER) Link
360 256 3-layer lstm with Marginalization 19.5 19.0 link
960 256 3-layer lstm with Marginalization 18.2 17.8 link

About

Code for Autoregressive Co-training for Learning Discrete Representations https://arxiv.org/abs/2203.15840

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages