Skip to content

the first industrial-scale open-source Kazakh speech corpus. KSC2 corpus subsumes the previously introduced two corpora: KSC and KazakhTTS2 and supplements additional data from other sources. KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.

License

IS2AI/ISSAI_SAIDA_Kazakh_ASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

ISSAI_SAIDA_Kazakh_ASR

This repository provides the recipe for the paper A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline.

Setup and Requirements

Our code builds upon ESPnet, and requires prior installation of the framework. Please follow the installation guide and put the ksc folder inside espnet/egs/ directory.

After succesfull installation of ESPnet & Kaldi, go to ISSAI_SAIDA_Kazakh_ASR/asr1 folder and create links to the dependencies:

ln -s ../../../tools/kaldi/egs/wsj/s5/steps steps
ln -s ../../../tools/kaldi/egs/wsj/s5/utils utils

The directory for running the experiments (ISSAI_SAIDA_Kazakh_ASR/<exp-name) can be created by running the following script:

./setup_experiment.sh <exp-name>

Downloading the dataset

Download ISSAI_KSC_335RS dataset and untar in the directory of your choice. Specify the path to the dataset inside ISSAI_SAIDA_Kazakh_ASR/<exp-name>/conf/data_path.conf file:

dataset_dir=/path-to/ISSAI_KSC_335RS_v1.1

Training

To train the models, run the script ./run.sh inside ISSAI_SAIDA_Kazakh_ASR/<exp-name>/ folder.

Pre-trained model

You can find the link to the latest pre-trained Transformer model here. Untar it in ksc/<exp-name>/.

Inference

To decode a single audio, specify paths to the following files inside recog_wav.sh script:

lang_model= path to rnnlm.model.best
cmvn= path to cmvn.ark for example data/train/cmvn.ark
recog_model= path to e2e model, in case of transformer: model.last10.avg.best 

Then, run the following script:

./recog_wav.sh <path-to-audio-file>

About

the first industrial-scale open-source Kazakh speech corpus. KSC2 corpus subsumes the previously introduced two corpora: KSC and KazakhTTS2 and supplements additional data from other sources. KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published