Machine Translation
alvations edited this page Dec 14, 2016
·
12 revisions
Statistical machine translation (SMT) is a rapidly growing area within Computational Linguistics. NLTK should provide a pathway for students who want to learn about the basic algorithms. View NLTK activity on SMT.
Existing functionality is mostly in the translate submodule. It includes:
- IBM Models 1-3
translate/ibm{1,2,3}.py
- MT evaluation metrics
- BLEU
translate/bleu_score.py
- RIBES
translate/ribes_score.py
- ChrF
translate/chrf_score.py
- GLEU
translate/gleu_score.py
- Gale-Church Sentence Aligner
translate/gale_church.py
- Aligned sentence reader
corpus/reader/aligned.py
- Grow-Diagonal-Final-And Phrase Extraction
translate/phrase_based.py
We would like to add functionality in the following areas:
- Decoder for word-based models
- Phrase-based models
- Phrase probability estimation
- Decoder for phrase-based models
- Visualization of word alignments
Existing Python implementations that could possibly be incorporated into NLTK