Machine Translation

Statistical machine translation (SMT) is a rapidly growing area within Computational Linguistics. NLTK should provide a pathway for students who want to learn about the basic algorithms. View NLTK activity on SMT.

Existing functionality

Existing functionality is mostly in the translate submodule. It includes:

IBM Models 1-3 translate/ibm{1,2,3}.py
MT evaluation metrics
BLEU translate/bleu_score.py
RIBES translate/ribes_score.py
ChrF translate/chrf_score.py
GLEU translate/gleu_score.py
Gale-Church Sentence Aligner translate/gale_church.py
Aligned sentence reader corpus/reader/aligned.py
Grow-Diagonal-Final-And Phrase Extraction translate/phrase_based.py

Planned functionality

We would like to add functionality in the following areas:

Decoder for word-based models
Phrase-based models
Phrase probability estimation
Decoder for phrase-based models
Visualization of word alignments

Third-party implementations

Existing Python implementations that could possibly be incorporated into NLTK

KenLM
Kriya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine Translation

Existing functionality

Planned functionality

Third-party implementations

Useful links

Clone this wiki locally