VietASR (Vietnamese Automatic Speech Recognition)

⚡ Some experiment with NeMo ⚡

Model: QuartzNet is a smaller version of Jaser model

The pretrained model on this repo was trained with ~100 hours Vietnamese speech dataset, was collected from youtube, radio, call center(8k), text to speech data and some public dataset (vlsp, vivos, fpt). It is very small model (13M parameters) make it inference so fast ⚡

🌱 Update: The new version available on branch v2.0 is built from scratch with PyTorch

Installation

Update & install linux libs:

apt-get update && apt-get install -y libsndfile1 ffmpeg

Install python>=3.8

Python libs:

pip install -r requirements.txt

Install torch 1.8.1:

# cpu only, you can install CUDA version if you have NVidia GPU
pip install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Install kemlm for LM decoding (only support Linux)

pip install https://github.com/kpu/kenlm/archive/master.zip

Transcribe audio file

python infer.py audio_samples # will transcribe audio file in folder: audio_samples

Run web application

Run app:

python app.py # app will run on address: https://localhost:5000

Video demo on Youtube:
- v1: https://youtu.be/P3mhEngL1us
- v2: https://youtu.be/o9NpWi3VUHs

TODO

Conformer Model
Data augumentation: speed, noise, pitch shift, time shift,...
FastAPI
Add Dockerfile

Citation

  @article{kuchaiev2019nemo,
    title={Nemo: a toolkit for building ai applications using neural modules},
    author={Kuchaiev, Oleksii and Li, Jason and Nguyen, Huyen and Hrinchuk, Oleksii and Leary, Ryan and Ginsburg, Boris and Kriman, Samuel and Beliaev, Stanislav and Lavrukhin, Vitaly and Cook, Jack and others},
    journal={arXiv preprint arXiv:1909.09577},
    year={2019}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
audio_samples		audio_samples
configs		configs
models		models
nemo		nemo
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
demo.JPG		demo.JPG
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio_samples

audio_samples

configs

configs

models

models

nemo

nemo

templates

templates

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

app.py

app.py

demo.JPG

demo.JPG

infer.py

infer.py

requirements.txt

requirements.txt

Repository files navigation

VietASR (Vietnamese Automatic Speech Recognition)

Installation

Transcribe audio file

Run web application

TODO

Citation

About

Languages

License

dangvansam/viet-asr

Folders and files

Latest commit

History

Repository files navigation

VietASR (Vietnamese Automatic Speech Recognition)

Installation

Transcribe audio file

Run web application

TODO

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages