Skip to content

Machine Translation

alvations edited this page Dec 14, 2016 · 12 revisions

Statistical machine translation (SMT) is a rapidly growing area within Computational Linguistics. NLTK should provide a pathway for students who want to learn about the basic algorithms. View NLTK activity on SMT.

Existing functionality

Existing functionality is mostly in the translate submodule. It includes:

  • IBM Models 1-3 translate/ibm{1,2,3}.py
  • MT evaluation metrics
  • BLEU translate/bleu_score.py
  • RIBES translate/ribes_score.py
  • ChrF translate/chrf_score.py
  • GLEU translate/gleu_score.py
  • Gale-Church Sentence Aligner translate/gale_church.py
  • Aligned sentence reader corpus/reader/aligned.py
  • Grow-Diagonal-Final-And Phrase Extraction translate/phrase_based.py

Planned functionality

We would like to add functionality in the following areas:

Third-party implementations

Existing Python implementations that could possibly be incorporated into NLTK

Useful links