asr-benchmark

The goal here is to evaluate some automatic speech recognition (ASR) systems for Brazilian Portuguese (although the tools developed here may be used to evaluate ASR systems in any language). The databases that were used are public and may be freely downloaded by anyone. The designed setup may be reproduced and the results may be confirmed by anyone who wants.

Download databases

LapsBenchMark1.4: wget http://www.laps.ufpa.br/falabrasil/files/LapsBM1.4.rar
Voxforge: wget -r -nH -nd -np -R index.html* http://www.repository.voxforge1.org/downloads/pt/Trunk/Audio/Original/48kHz_16bit/

After downloading you must downsample the databases to 16000 Hz and 8000 Hz. It can be done with any tool you want. A good one is sox.

LapsBenchMark1.4 has 700 files. Voxforge has many more files, but in this benchmark, 700 audio files were randomly sampled from this database and used in the evaluation. The chosen files are listed in data/voxforge-{8k,16k}.txt.

Dependencies

You will need Python 3 to run the benchmark scripts. And, optionally, you may use some scripts I wrote in Bash to process the transcriptions generated by the benchmark scripts.

I use Anaconda to deal with Python dependencies, which, in this case, were watson-developer-cloud and python-dotenv.

For creating my environment, I did:

conda create -n asr python=3.5
source activate asr
pip install --upgrade watson-developer-cloud
pip install python-dotenv
pip install SpeechRecognition

In this benchmark, word error rate (WER) and sentence error rate (SER) will be evaluated and you will need a tool to measure them. The sclite, included in NIST Speech Recognition Scoring Toolkit may be used for this purpose. Another equivalent tool is the compute-wer from kaldi toolkit. I used this last one just because I had kaldi installed in my machine.

You will also need to create some credentials to access IBM and Microsoft speech API. You must go to IBM Bluemix and Microsoft Bing to get your keys.

After grabbing your keys, create a .env file in the scripts directory with the following variables and theirs values:

BLUEMIX_USERNAME="XXXXXXXX"
BLUEMIX_PASSWORD="YYYYYY"
SUBSCRIPTION_KEY="MMMMMM"
INSTANCE_ID="ZZZZZZ"
REQUEST_ID="QQQQQQQ"

BLUEMIX_USERNAME and BLUEMIX_PASSWORD are keys necessary for running IBM benchmark. The other 3 keys are only necessary to run Microsoft benchmark.

Benchmark

source activate asr

python scripts/ibmASR.py 16000 data/laps-16k.txt > results/ibm-laps-16k.tra
python scripts/ibmASR.py 8000  data/laps-8k.txt  > results/ibm-laps-8k.tra
python scripts/ibmASR.py 16000 data/voxforge-16k.txt > results/ibm-voxforge-16k.tra
python scripts/ibmASR.py 8000  data/voxforge-8k.txt  > results/ibm-voxforge-8k.tra

python scripts/microsoftASR.py 16000 data/laps-16k.txt > results/microsoft-laps-16k.tra
python scripts/microsoftASR.py 8000  data/laps-8k.txt  > results/microsoft-laps-8k.tra
python scripts/microsoftASR.py 16000 data/voxforge-16k.txt > results/microsoft-voxforge-16k.tra
python scripts/microsoftASR.py 8000  data/voxforge-8k.txt  > results/microsoft-voxforge-8k.tra

python scripts/googleASR.py data/laps-16k.txt > results/google-laps-16k.tra
python scripts/googleASR.py data/laps-8k.txt  > results/google-laps-8k.tra
python scripts/googleASR.py data/voxforge-16k.txt > results/google-voxforge-16k.tra
python scripts/googleASR.py data/voxforge-8k.txt  > results/google-voxforge-8k.tra

./scripts/buildLapsHyp.sh results/ibm-laps-16k.tra > hypotheses/ibm-laps-16k.hyp
./scripts/buildLapsHyp.sh results/ibm-laps-8k.tra  > hypotheses/ibm-laps-8k.hyp
./scripts/buildVoxforgeHyp.sh results/ibm-voxforge-8k.tra  > hypotheses/ibm-voxforge-8k.hyp
./scripts/buildVoxforgeHyp.sh results/ibm-voxforge-16k.tra > hypotheses/ibm-voxforge-16k.hyp

./scripts/buildLapsHyp.sh results/microsoft-laps-16k.tra > hypotheses/microsoft-laps-16k.hyp
./scripts/buildLapsHyp.sh results/microsoft-laps-8k.tra  > hypotheses/microsoft-laps-8k.hyp
./scripts/buildVoxforgeHyp.sh results/microsoft-voxforge-8k.tra  > hypotheses/microsoft-voxforge-8k.hyp
./scripts/buildVoxforgeHyp.sh results/microsoft-voxforge-16k.tra > hypotheses/microsoft-voxforge-16k.hyp

compute-wer --mode=present ark:references/laps.ref ark:hypotheses/ibm-laps-16k.hyp
compute-wer --mode=present ark:references/laps.ref ark:hypotheses/ibm-laps-8k.hyp
compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/ibm-voxforge-16k.hyp
compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/ibm-voxforge-8k.hyp

compute-wer --mode=present ark:references/laps.ref ark:hypotheses/microsoft-laps-16k.hyp
compute-wer --mode=present ark:references/laps.ref ark:hypotheses/microsoft-laps-8k.hyp
compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/microsoft-voxforge-16k.hyp
compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/microsoft-voxforge-8k.hyp

Results

Results shown in terms of WER (Word Error Rate) and SER (Sentence Error Rate).

Database	IBM	Microsoft
Laps 16 kHz	%WER 13.59 [ 982 / 7228, 110 ins, 217 del, 655 sub ] %SER 64.14 [ 449 / 700 ]	%WER 15.88 [ 1148 / 7228, 96 ins, 248 del, 804 sub ] %SER 68.00 [ 476 / 700 ]
Laps 8 kHz	%WER 13.89 [ 1004 / 7228, 106 ins, 242 del, 656 sub ] %SER 64.57 [ 452 / 700 ]	%WER 16.03 [ 1159 / 7228, 97 ins, 248 del, 814 sub ] %SER 67.29 [ 471 / 700 ]
Voxforge 16 kHz	%WER 31.23 [ 1067 / 3417, 134 ins, 313 del, 620 sub ] %SER 54.74 [ 375 / 685 ]	%WER 18.28 [ 616 / 3370, 46 ins, 186 del, 384 sub ] %SER 39.73 [ 269 / 677 ]
Voxforge 8 kHz	%WER 28.62 [ 995 / 3477, 115 ins, 284 del, 596 sub ] %SER 53.58 [ 374 / 698 ]	%WER 18.05 [ 611 / 3385, 46 ins, 197 del, 368 sub ] %SER 39.21 [ 267 / 681 ]

These are the results in 5/february/2017. The systems may be upgraded along the time and these rates may change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

hypotheses

hypotheses

references

references

results

results

scripts

scripts

.gitignore

.gitignore

README.md

README.md

Repository files navigation

asr-benchmark

Download databases

Dependencies

Benchmark

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
hypotheses		hypotheses
references		references
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md

jcsilva/asr-benchmark

Folders and files

Latest commit

History

Repository files navigation

asr-benchmark

Download databases

Dependencies

Benchmark

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages