Automatic Speech Recognition (ASR) - DeepSpeech Swiss German

This is the project for the paper LTL-UDE at Low-Resource Speech-to-Text Shared Task : Investigating Mozilla DeepSpeech in a low-resource setting published at SWISSTEXT 5th and KONVENS 2020.

This project aims to develop a working Speech to Text module using Mozilla DeepSpeech, which can be used for any Audio processing pipeline. Mozillla DeepSpeech is a state-of-the-art open-source automatic speech recognition (ASR) toolkit. DeepSpeech is using a model trained by deep learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

Important Links:

Paper: https://www.researchgate.net/publication/342338332_LTL-UDE_at_Low-Resource_Speech-to-Text_Shared_Task_Investigating_Mozilla_DeepSpeech_in_a_low-resource_setting

DeepSpeech-API: https://github.com/AASHISHAG/DeepSpeech-API

This Readme is written for DeepSpeech v0.6.0. Refer to Mozillla DeepSpeech for lastest updates.

$ mkdir mozilla_en
$ cd mozilla_en
$ wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-4-2019-12-10/en.tar.gz
$ tar -xzvf en.tar.gz
$ python3 DeepSpeech/bin/import_cv2.py --audio_dir path --filter_alphabet deepspeech-swiss-german/data/en_alphabet.txt export_path <change the path accordingly>

2. LibriSpeech_EN

$ mkdir librispeech
$ cd librispeech
$ python3 DeepSpeech/bin/import_librivox.py export_path <change the path accordingly>

3. Mozilla_DE

$ mkdir mozilla_de
$ cd mozilla_de
$ wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-4-2019-12-10/de.tar.gz
$ tar -xzvf de.tar.gz
$ python3 DeepSpeech/bin/import_cv2.py --audio_dir path --filter_alphabet deepspeech-swiss-german/data/alphabet.txt export_path <change the path accordingly>

4. Mailabs_DE

$ mkdir mailabs
$ cd mailabs
$ python3 DeepSpeech/bin/import_m-ailabs.py --language de_DE --filter_alphabet deepspeech-swiss-german/data/alphabet.txt export_path <change the path accordingly>

5. Tuda_DE

$ mkdir tuda
$ cd tuda
$ wget http://www.repository.voxforge1.org/downloads/de/german-speechdata-package-v2.tar.gz
$ tar -xzvf german-speechdata-package-v2.tar.gz
$ deepspeech-swiss-german/pre-processing/prepare_data.py --tuda corpus_path export_path

6. Voxforge_DE

$ mkdir voxforge
$ cd voxforge

python3
$ from audiomate.corpus import io
$ dl = io.VoxforgeDownloader(lang='de')
$ dl.download(voxforge_corpus_path)

$ deepspeech-swiss-german/pre-processing/run_to_utf_8.sh
$ python3 deepspeech-swiss-german/prepare_data.py --voxforge corpus_path export_path <change the path accordingly>

NOTE: Change the path accordingly in run_to_utf_8.sh

7. SwissText_DE

$ mkdir swisstext
$ cd swisstext
$ https://drive.switch.ch/index.php/s/PpUArRmN5Ba5C8J <download link>
$ unzip train.zip
$ python3 deepspeech-swiss-german/prepare_data_swiss_german.py
$ python3 deepspeech-swiss-german/shuffle_and_split.py

8. ArchiMob_DE

Follow steps here:

$ https://github.com/AASHISHAG/archimob-swissgerman-deepspeech-importer

Language Model

We used KenLM toolkit to train a 3-gram language model. It is Language Model inference code by Kenneth Heafield

Installation

$ git clone https://github.com/kpu/kenlm.git
$ cd kenlm
$ mkdir -p build
$ cd build
$ cmake ..
$ make -j `nproc`

Corpus

We used an open-source German Speech Corpus released by University of Hamburg and European Parliament Proceedings Parallel Corpus 1996-2011

Download the data (EN, DE)

##EN
$ using Mozilla default LM and Trie

## DE
$ wget http://ltdata1.informatik.uni-hamburg.de/kaldi_tuda_de/German_sentences_8mil_filtered_maryfied.txt.gz
$ gzip -d German_sentences_8mil_filtered_maryfied.txt.gz
$ wget https://www.statmt.org/europarl/v7/de-en.tgz
$ tar -xzvf de-en.tgz
$ cat German_sentences_8mil_filtered_maryfied.txt  >> europarl-v7.de-en.de

Pre-process the data (DE)

$ deepspeech-swiss-german/pre-processing/prepare_vocab.py europarl-v7.de-en.de exp_path/clean_vocab.txt

Build the Language Model (DE)

$ kenlm/build/bin/lmplz --text exp_path/clean_vocab.txt --arpa exp_path/words.arpa --o 3
$ kenlm/build/bin/build_binary -T -s exp_path/words.arpa exp_path/de_lm.binary

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Example:

$ kenlm/build/bin/lmplz --text exp_path/clean_vocab.txt --arpa exp_path/words.arpa --o 3 -S 50%

Build Trie (DE)

$ DeepSpeech/native_client/generate_trie deepspeech-swiss-german/data/alphabet.txt path/de_lm.binary export_path/de_trie

Training

Change the path accordingly.

$ ./DeepSpeech.py --train_files train.csv --dev_files dev.csv --test_files test.csv --alphabet_config_path alphabet.txt --lm_trie_path trie --lm_binary_path lm.binary --test_batch_size 36 --train_batch_size 24 --dev_batch_size 36 --epochs 75 --learning_rate 0.0001 --dropout_rate 0.25 --export_dir ../models

Training with Augmentation

Change the path accordingly.

$ ./DeepSpeech.py --train_files train.csv --dev_files dev.csv --test_files test.csv --alphabet_config_path alphabet.txt --lm_trie_path trie --lm_binary_path lm.binary --test_batch_size 36 --train_batch_size 24 --dev_batch_size 36 --epochs 75 --learning_rate 0.0001 --dropout_rate 0.25 --export_dir ../models AUG_AUDIO="--data_aug_features_additive 0.2 --data_aug_features_multiplicative 0.2 --augmentation_speed_up_std 0.2"  AUG_FREQ_TIME="--augmentation_freq_and_time_masking --augmentation_freq_and_time_masking_freq_mask_range 5 --augmentation_freq_and_time_masking_number_freq_masks 3 --augmentation_freq_and_time_masking_time_mask_range 2 --augmentation_freq_and_time_masking_number_time_masks 3" AUG_PITCH_TEMPO="--augmentation_pitch_and_tempo_scaling --augmentation_pitch_and_tempo_scaling_min_pitch 0.95 --augmentation_pitch_and_tempo_scaling_max_pitch 1.2 --augmentation_pitch_and_tempo_scaling_max_tempo 1.2" AUG_SPEC_DROP="--augmentation_spec_dropout_keeprate 0.2"

Results

Some results from our findings.

English -> German -> Swiss : 56.6

NOTE: Refer our paper for more information.

Acknowledgments

Prof. Dr.-Ing. Torsten Zesch - Co-Author

References

If you use our findings/scripts in your academic work, please cite:

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
media		media
pre-processing		pre-processing
LICENSE		LICENSE
README.md		README.md
python_requirements.txt		python_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

media

media

pre-processing

pre-processing

LICENSE

LICENSE

README.md

README.md

python_requirements.txt

python_requirements.txt

Repository files navigation

Automatic Speech Recognition (ASR) - DeepSpeech Swiss German

Important Links:

Contents

Requirements

Installing Python bindings

Mozilla DeepSpeech

Speech Corpus

Download and Prepare the Audio Data

Language Model

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Training

Training with Augmentation

Results

Acknowledgments

References

About

Releases

Packages

Languages

License

AASHISHAG/deepspeech-swiss-german

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (ASR) - DeepSpeech Swiss German

Important Links:

Contents

Requirements

Installing Python bindings

Mozilla DeepSpeech

Speech Corpus

Download and Prepare the Audio Data

Language Model

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Training

Training with Augmentation

Results

Acknowledgments

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages