Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 537 Bytes

File metadata and controls

8 lines (6 loc) · 537 Bytes

Machine_Translation_Seq2Seq

Machine translation (jp-en) using LSTM-based encoder-decoder model (Pytorch). This is the implementation of several models:

adapted to JP-EN translation.

Data: https://nlp.stanford.edu/projects/jesc/, official split. The xls data is converted into csv with panda (prepro.py). Japanese is tokenized using sentencepiece (https://github.com/google/sentencepiece/), English is tokenized using space (sorry, too lazy).