stenella

This is the code to reproduce results in our EMNLP2021 paper Levenshtein Training for Word-level Quality Estimation.

We did all of our experiments with a huge ducttape workflow, which also includes a lot of system-specific setup for our environment. As a result, we provide a concise bash script to help reproduce the best result from our LevT checkpoints, while also providing the raw workflow for you to adapt to your own environment, or deep-dive into some aspects that were not covered in the bash script. If you have any questions about the workflow, feel free to post an issue and I'll try my best to answer.

Requirements

We are going to assume that you have the following binaries reachable from $PATH:

spm_encode and spm_decode, installed from https://github.com/google/sentencepiece (we used v0.1.5 but any version should work)
teralign, installed from https://github.com/marian-nmt/moses-scorers (compile the binary following the README.md of that repo)

Note that instead of using the more popular tercom, we use our own implementation teralign to do the TER computation (which, in our opinion, is easier to use). But we do see some slight mismatch between the edit tags generated by tercom vs. teralign because of the ambiguities in beam search. Hence, to produce comparable results with the previous WMT submissions, please do not generate your own edit tags on the test set with teralign and evaluate against them.

Preparation

We assume that $BASE is the path of this repository in your system.

BASE=/path/to/repo

# untar some data
cd $BASE/data/data/post-editing/train
tar -zxvf en-de-train.tar.gz
tar -zxvf en-zh-train.tar.gz

cd $BASE/data/data/post-editing/dev
tar -zxvf en-de-dev.tar.gz
tar -zxvf en-zh-dev.tar.gz

cd $BASE/data/data/post-editing/test
tar -zxvf en-de-test.tar.gz
tar -zxvf en-zh-test.tar.gz

# download BPE model
mkdir -p $BASE/models
cd $BASE/models
wget https://dl.fbaipublicfiles.com/m2m_100/spm.128k.model
wget https://dl.fbaipublicfiles.com/m2m_100/model_dict.128k.txt

You'll also need to download our intermediate checkpoints:

cd $BASE/models
# the run.sh script is set to reproduce the best en-de results (MCC=0.589)
# we have a few other checkpoints for download:
# en-de M2M w/o synthetic pre-training: https://www.cs.jhu.edu/~sding/downloads/emnlp2021/emnlp2021-en-de-nat.pt (MCC=0.583)
# en-zh M2M w/o synthetic pre-training: https://www.cs.jhu.edu/~sding/downloads/emnlp2021/emnlp2021-en-zh-nat.pt (MCC=0.633)
# en-zh M2M w synthetic pre-training: https://www.cs.jhu.edu/~sding/downloads/emnlp2021/emnlp2021-en-zh-best.pt (MCC=0.646)
wget https://www.cs.jhu.edu/~sding/downloads/emnlp2021/emnlp2021-en-de-best.pt

Run

Open run.sh, update the value of BASE to the path where you stored your repo.

Then, simply running bash run.sh should reproduce the best en-de result for you. You can configure checkpoint, src and tgt to reproduce the other results built from the M2M model.

Reference

If you use this codebase, or the teralign binary from the moses-scorers repo, please cite the following paper:

@inproceedings{ding-etal-2021-levenshtein,
    title = "{L}evenshtein Training for Word-level Quality Estimation",
    author = "Ding, Shuoyang  and
      Junczys-Dowmunt, Marcin  and
      Post, Matt  and
      Koehn, Philipp",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.539",
    pages = "6724--6733",
    abstract = "We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation. A Levenshtein Transformer is a natural fit for this task: trained to perform decoding in an iterative manner, a Levenshtein Transformer can learn to post-edit without explicit supervision. To further minimize the mismatch between the translation task and the word-level QE task, we propose a two-stage transfer learning procedure on both augmented data and human post-editing data. We also propose heuristics to construct reference labels that are compatible with subword-level finetuning and inference. Results on WMT 2020 QE shared task dataset show that our proposed method has superior data efficiency under the data-constrained setting and competitive performance under the unconstrained setting.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data @ 0c0773a		data @ 0c0773a
fairseq @ 40dbf79		fairseq @ 40dbf79
scripts		scripts
workflow		workflow
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data @ 0c0773a

data @ 0c0773a

fairseq @ 40dbf79

fairseq @ 40dbf79

scripts

scripts

workflow

workflow

.gitmodules

.gitmodules

LICENSE

LICENSE

README.md

README.md

run.sh

run.sh

Repository files navigation

stenella

Requirements

Preparation

Run

Reference

About

Releases

Packages

Languages

License

shuoyangd/stenella

Folders and files

Latest commit

History

Repository files navigation

stenella

Requirements

Preparation

Run

Reference

About

Resources

License

Stars

Watchers

Forks

Languages