Skip to content

dojoteef/synst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SynST: Syntactically Supervised Transformers

This is the official repository which contains all the code necessary to replicate the results from the ACL 2019 long paper Syntactically Supervised Transformers for Faster Neural Machine Translation. It can also be used to train a vanilla Transformer or a Semi-Autoregressive Transformer.

The full model architecture is displayed below:

Our approach uses syntactic supervision to speed up neural machine translation (NMT) for the Transformer architecture. We modify the Transformer architecture by adding a single layer parse decoder that autoregressively predicts a shallow chunking of the target parse. Then, conditioned on this parse, a separate token decoder generates the final target translation in one shot (non-autoregressively). The figure above demonstrates the inputs and outputs for each module in the architecture.

Requirements

The code requires Python 3.6+. The python dependencies can be installed with the command (using a virtual environment is highly recommended):

pip install -r requirements.txt

In order to parse the datasets, the code also depends upon jq and the shift-reduce parsers from CoreNLP. First, make sure you have an appropriate Java runtime installed.

Then download and unzip the main CoreNLP package to the directory of your choice:

curl -O https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip
unzip stanford-corenlp-full-2018-10-05.zip

You'll also need download the shift reduce parsers for each of the languages:

cd stanford-corenlp-full-2018-10-05
curl -O https://nlp.stanford.edu/software/stanford-srparser-2014-10-23-models.jar
curl -O https://nlp.stanford.edu/software/stanford-french-corenlp-2018-10-05-models.jar
curl -O https://nlp.stanford.edu/software/stanford-german-corenlp-2018-10-05-models.jar

Additionally, if you want to use the scripts that wrap multi-bleu.perl and sacrebleu, then you'll need to have Moses-SMT available as well.

Basic Usage

The code has one main entry point main.py with a couple of support scripts for the analysis conducted in the paper. Please use python main.py -h for additional options not listed below. You can also use python main.py <action> -h for options specific to the available actions: {train, evaluate, translate, pass}.

Preprocessing

CLASSPATH=stanford-corenlp-full-2018-10-05/* python main.py \
  --dataset wmt_en_de_parsed --span 6 -d raw/wmt -p preprocessed/wmt -v pass

Troubleshooting

If you have issues with preprocessing, a few common problems are:

  1. Not correctly setting your CLASSPATH to include CoreNLP
  2. The environment variables for LANG and LC_ALL are not set to use UTF-8. Try setting LANG=en_US.UTF-8 LC_ALL= on the command-line when running the preprocessing.

Training

Assuming you have access to 8 1080Ti GPUs you can recreate the results for SynST on the WMT'14 En-De dataset with:

python main.py -b 3175 --dataset wmt_en_de_parsed --span 6 \
  --model parse_transformer -d raw/wmt -p preprocessed/wmt -v train \
  --checkpoint-interval 1200 --accumulate 2 --label-smoothing 0

The above commandline will train 8 GPUs with approximately 3175 source/target tokens combined per GPU, and accumulate the gradients over two batches before updating model parameters (leading to ~50.8k tokens per model update).

The default model is the Transformer model, which can take the additional commandline argument --span <k> to produce a semi-autoregressive variant (where the default --span 1 is the basic Transformer). For example the below line will train a semi-autoregressive Transformer with k=2 on the WMT'14 De-En dataset:

python main.py -b 3175 --dataset wmt_de_en --span 2 \
  -d raw/wmt -p preprocessed/wmt -v train \
  --checkpoint-interval 1200 --accumulate 2

Evalulating Perplexity

You can run a separate process to evaluate each new checkpoint generated during training (you may either want to do it on a GPU not used for training or disable cuda as done below):

python main.py -b 5000 --dataset wmt_en_de_parsed --span 6 \
  --model parse_transformer -d raw/wmt -p preprocessed/wmt \
  --split valid --disable-cuda -v evaluate \
  --watch-directory /tmp/synst/checkpoints

Translating

After training a model, you can generate translations with the following command (currently only translation on a single GPU is supported):

CUDA_VISIBLE_DEVICES=0 python main.py --dataset wmt_en_de_parsed --span 6 \
  --model parse_transformer -d raw/wmt -p preprocessed/wmt \
  --batch-size 1 --batch-method example --split test -v \
  --restore /tmp/synst/checkpoints/checkpoint.pt \
  --average-checkpoints 5 translate \
  --max-decode-length 50 --length-basis input_lens --order-output

Which by default, will output translations to /tmp/synst/output.

Experiment tracking

If you have a comet.ml account, on you can track experiments, by prefixing the script call with:

env $(cat ~/.comet.ml | xargs) python main.py --track ...

Where ~/.comet.ml is the file which contains your API key for logging experiments on the service. By default, this will track experiments in a workspace named umass-nlp with project name synst. See args.py in order to configure the experiment tracking to suit your needs.

Cite

@inproceedings{akoury2019synst,
  title={Syntactically Supervised Transformers for Faster Neural Machine Translation},
  author={Akoury, Nader and Krishna, Kalpesh and Iyyer, Mohit},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  year={2019}
}

About

Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published