GitHub - charlesliucn/LanMIT: 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.

Low-resourced Language Modeling based on Kaldi

This repository provides Kaldi users with a few useful scripts for language modeling, especially for low-resourced conditions. The scripts are mainly based on babel/s5d in egs directory.

Most of the scripts are in babel/s5d and wsj/s5/steps.

Currently, the scripts are not so well organized. A document of detailed usage of these scripts will be added later.

Main Contributions

Data Augmentation
- Text Preprocessing for Lexicon Generation
- Vocabulary Expansion Based on Word Frequency
- Data Selection Based on Multiple Criteria
N-Gram Language Models based on SRILM
- Linear Interpolation for N-Gram models
- N-Gram Language Model for Rescoring
LSTM Language Model Based on Tensorflow
- Word Vectors Pre-training for RNN/LSTM Language Model Training
- LSTM Language Model for Rescoring

Relevant Toolkits

XenC: an open-source tool for data selection in Natural Language Processing.
GloVe: Global Vectors for Word Representation.
SRILM: an Extensible Language Modeling Toolkit.

Contact

Any questions please send e-mails to charlesliutop@gmail.com.

More info about Kaldi Speech Recognition Toolkit, please see Kaldi's official github repository.

Name		Name	Last commit message	Last commit date
Latest commit History 8,325 Commits
egs		egs
misc		misc
scripts/rnnlm		scripts/rnnlm
src		src
tools		tools
windows		windows
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
COPYING		COPYING
INSTALL		INSTALL
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

egs

egs

misc

misc

scripts/rnnlm

scripts/rnnlm

src

src

tools

tools

windows

windows

.gitattributes

.gitattributes

.gitignore

.gitignore

.travis.yml

.travis.yml

COPYING

COPYING

INSTALL

INSTALL

README.md

README.md

Repository files navigation

Low-resourced Language Modeling based on Kaldi

Main Contributions

Relevant Toolkits

Contact

About

Releases

Packages

Languages

License

charlesliucn/LanMIT

Folders and files

Latest commit

History

Repository files navigation

Low-resourced Language Modeling based on Kaldi

Main Contributions

Relevant Toolkits

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages