Automatic Speech Recognition Models

End-to-end (E2E) automatic speech recognition (ASR) models were implemented with Pytorch.
We used KsponSpeech dataset for training and Hydra to control all the training configurations.

Installation

pip install -e .

Preparation

You can download dataset at AI-Hub. Anyone can download this dataset just by applying. Then, the KsponSpeech dataset was preprocessed through here.

Usage

- Training

You can choose from several models and training options.

Deep Speech2 Training

$ python main.py \
    model=deepspeech2 \
    train=deepspeech2_train \
    train.dataset_path=$DATASET_PATH \
    train.audio_path=$AUDIO_PATH \
    train.label_path=$LABEL_PATH

Listen, Attend and Spell Training

$ python main.py \
    model=las train=las_train \
    train.dataset_path=$DATASET_PATH \
    train.audio_path=$AUDIO_PATH \
    train.label_path=$LABEL_PATH

Joint CTC-Attention Listen, Attend and Spell Training

$ python main.py \
    model=joint_ctc_attention_las \
    train=las_train \
    train.dataset_path=$DATASET_PATH \
    train.audio_path=$AUDIO_PATH \
    train.label_path=$LABEL_PATH

- Evaluation

$ python eval.py \
    eval.dataset_path=$DATASET_PATH \
    eval.audio_path=$AUDIO_PATH \
    eval.label_path=$LABEL_PATH \
    eval.model_path=$MODEL_PATH

Reference

Author

seomk9896@gmail.com

License

MIT License

Copyright (c) 2021 Sangchun Ha

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
configs		configs
data		data
evaluator		evaluator
models		models
test		test
trainer		trainer
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
inference.py		inference.py
main.py		main.py
model_builder.py		model_builder.py
setup.py		setup.py
vocabulary.py		vocabulary.py

License

upskyy/Automatic-Speech-Recognition-Models

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition Models

Installation

Preparation

Usage

- Training

- Evaluation

Reference

Author

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages