Project documentation

UPDATE -- 14.2.23

We added a fairseq implementation of the Shortcuts Transformer (see fairseq directory) which can be used according to the fairseq documentation for version 0.10.2. To use the Shortcut Transformer without Feature Fusion, set the --arch parameter to shortcut_transformer argument. To use the Shortcut Transformer with Feature Fusion, set the --arch parameter to shortcut_transformer_with_feature_fusion argument. This reimplementation is comparable to the original one with respect to its efficacy relative to the base transformer model. To use it, merge the fairseq/fairseq directory in your local fairseq repository with the one provided here.

Dependencies

python 3.6
TensorFlow 1.8

Summary

Code for the reproduction of the lexical shortcut studies detailed in Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts. Please refer to the paper for hyper-parameter settings, training and evaluation datasets used, and the primary findings.

Usage

Scripts used to conduct the experiments described in the paper are provided in the 'scripts' directory. Their functionality is as follows:

preprocess.sh: Used to pre-process the training, development and test corpora used in our experiments (development and test corpora first have to be converted to plain text, e.g. by using input-from-sgm.perl, provided in the Moses toolkit). Adjust as needed for different language pairs.
train.sh: Used to train the translation models. To replicate different experiments, select the appropriate values for the --model_type and --shortcut_type flags (i.e. --model_type lexical_shortcuts_transformer and --shortcut_type lexical_plus_feature_fusion for a transformer variant equipped with lexical shortcuts and feature-fusion). See the nmt.py file for the available options. Adding the flag --embiggen_model to the training script enables the transformer-BIG configuration. To use transformer-SMALL, adjust the relevant hyper-parameter values directly in the training script.
test.sh: Used to obtain the test-BLEU scores reported in the paper for each trained model. --use_sacrebleu returns the (more conservative) sacreBLEU score, whereas omitting this flag will return scores obtained by the script employed to calculate validation-BLEU during training (based on multi-bleu-detok.py). The latter is roughly comparable to the BLEU calculation method employed in 'Attention Is All You Need', Vaswani et al, 2017.
train_classifier.sh: Used to train diagnostic lexical classifiers employed in the probing studies. Enabling --probe_encoder provides the classifier with access to the hidden states of the encoder, while omitting the flag trains the classifier on decoder states. --probe_layer denotes the ID of the encoder / decoder layer accessed by the classifier (1 being the lowest and 6 being the top-most).
test_classifier.sh: Used to obtain the accuracy of trained classifiers on a withheld test-set.

Citation

If you find this work useful, please consider citing the accompanying paper:

@article{emelin2019widening,
  title={Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts},
  author={Emelin, Denis and Titov, Ivan and Sennrich, Rico},
  journal={arXiv preprint arXiv:1906.12284},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
codebase		codebase
eval		eval
fairseq		fairseq
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

codebase

codebase

eval

eval

fairseq

fairseq

scripts

scripts

utils

utils

LICENSE

LICENSE

README.md

README.md

init.py

init.py

Repository files navigation

Project documentation

UPDATE -- 14.2.23

Dependencies

Summary

Usage

Citation

About

Releases

Packages

Languages

License

demelin/transformer_lexical_shortcuts

Folders and files

Latest commit

History

Repository files navigation

Project documentation

UPDATE -- 14.2.23

Dependencies

Summary

Usage

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages