This repository contains a Pytorch implementation of the structure-aware output layer for neural machine translation which was presented at WMT 2018 (PDF). The model is a generalized form of weight tying which shares parameters between input and output embeddings but allows learning a more flexible relationship with input word embeddings and enables the effective capacity of the output layer to be controlled. In addition, the model shares weights across output classifiers and translation contexts which allows it to better leverage prior knowledge about them.
@inproceedings{Pappas_WMT_2018,
author = {Pappas, Nikolaos and Miculicich, Lesly and Henderson, James},
title = {Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
address = {Brussels, Belgium},
year = {2018}
}
The files that are specific to our paper are the following ones: (i) GlobalAttention.py modifies the original attention to support different input and output dimensions for the attention mechanism, (ii) Generator.py contains the implementation of the proposed output layer, and (iii) train.py the sampling-based training approach.
- onmt/modules/GlobalAttention.py
- onmt/Generator.py
- train.py
The available code is largely based on an earlier version (v0.2.1) of OpenNMT (https://github.com/OpenNMT/OpenNMT-py) which requires Python ( http://www.python.org/getit/) and Pytorch library ( https://pytorch.org/) in order to run. For detailed instructions on how to install and use them please refer to the corresponding links above.