Machine translation (jp-en) using LSTM-based encoder-decoder model (Pytorch). This is the implementation of several models:
- model.py https://arxiv.org/abs/1409.3215,
- model2.py https://arxiv.org/abs/1406.1078
adapted to JP-EN translation.
Data: https://nlp.stanford.edu/projects/jesc/, official split. The xls data is converted into csv with panda (prepro.py). Japanese is tokenized using sentencepiece (https://github.com/google/sentencepiece/), English is tokenized using space (sorry, too lazy).