RegEMT: Regressive ensemble for machine translation evaluation

The master branch contains sources for reproducing our results reported in the WMT21 Metrics workshop.

See ablation-study for evaluating an impact of each of the ensembled metrics to the result, xling for zero-shot cross-lingual metric evaluation, multiling for evaluation of the fit on multiple languages, test_judgements for re-generating the submission, and docker-build for building a Docker image.

How to reproduce our results

Docker

To reproduce our results, you can use our miratmu/regemt Docker image using the NVIDIA Container Toolkit:

mkdir submit_dir
chmod 777 submit_dir

# test the installation on a data subsample before running the full evaluation process:
docker run --rm --gpus all -v "$PWD"/submit_dir:/submit_dir miratmu/regemt --fast

# simply run the evaluation on the full data sets:
# this takes ~10hrs on Tesla T4, might take longer on CPU
docker run --rm --gpus all -v "$PWD"/submit_dir:/submit_dir miratmu/regemt

The evaluation process will generate the correlation reports in .png and .pdf format for each of the evaluated configurations into the submit_dir/ directory.

Python

Alternatively, you can install our package using Python:

git clone https://github.com/MIR-MU/regemt.git
cd regemt
chmod 777 submit_dir

# install the dependencies
conda create --name wmt_eval python=3.8
conda activate wmt_eval
pip install -r requirements.txt

# test the installation on a data subsample before running the full evaluation process:
python -m main --fast

# simply run the evaluation on the full data sets:
# this takes ~10hrs on Tesla T4, might take longer on CPU
python -m main

The evaluation process will generate the correlation reports in .png and .pdf format for each of the evaluated configurations into the regemt/ directory.

We're trying to keep it simple, but if you get into any trouble, or have a question, don't hesitate to create an issue and we'll take a look!

Citing RegEmt

Text

ŠTEFÁNIK, Michal, Vít NOVOTNÝ and Petr SOJKA. Regressive Ensemble for Machine Translation Quality Evaluation. In Markus Freitag. Proceedings of EMNLP 2021 Sixth Conference on Machine Translation (WMT 21). ACL, 2021. 8 pp.

BibTeX

@inproceedings{stefanik2021regressive,
  author = {\v{S}tef\'{a}nik, Michal and Novotn\'{y}, V\'{i}t and Sojka, Petr},
  title = {Regressive Ensemble for Machine Translation Quality Evaluation},
  booktitle = {Proceedings of {EMNLP} 2021 Sixth Conference on Machine Translation ({WMT} 21)},
  editor = {Markus Freitag},
  publisher = {ACL},
  numpages = {8},
  url = {https://arxiv.org/abs/2109.07242v1},
}

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
.github/workflows		.github/workflows
data_dir		data_dir
prism		prism
scripts		scripts
.gitlab-ci.yml		.gitlab-ci.yml
README.md		README.md
_wmd.py		_wmd.py
bertscore.py		bertscore.py
bleurt_metric.py		bleurt_metric.py
comet_metric.py		comet_metric.py
common.py		common.py
conventional_metrics.py		conventional_metrics.py
correlations_mqm_noreference.ipynb		correlations_mqm_noreference.ipynb
correlations_psqm.ipynb		correlations_psqm.ipynb
correlations_wmd15.ipynb		correlations_wmd15.ipynb
embedder.py		embedder.py
ensemble.py		ensemble.py
main.py		main.py
multilingual_alignment_check.ipynb		multilingual_alignment_check.ipynb
ood_metrics.py		ood_metrics.py
prism_metric.py		prism_metric.py
requirements.txt		requirements.txt
requirements_fast.txt		requirements_fast.txt
scm.py		scm.py
setup.cfg		setup.cfg
wmd.py		wmd.py

MIR-MU/regemt

Folders and files

Latest commit

History

Repository files navigation

RegEMT: Regressive ensemble for machine translation evaluation

How to reproduce our results

Docker

Python

Citing RegEmt

Text

BibTeX

About

Resources

Stars

Watchers

Forks

Languages