WMT 2023 Metrics Shared Task Submission: Cometoid

Table of Contents

Setup
Models
Using Pymarian
- CLI
- Python API
Using marian
Reproduce Evaluation Results on WMT22
Citation

This repository contains documentation and scripts for Cometoid and Chrfoid metrics.

Setup

Our models are implemented in Marian NMT. We provide python bindings for Marian, which can be installed as follows:

Prerequisites: Install PyMarian

git clone https://github.com/marian-nmt/marian-dev
# Pybind PR (github.com/marian-nmt/marian-dev/pull/1013) not merged at the time of writing;
# so checkout the branch
git checkout tg/pybind-new

# build and install -- run this from project root, i.e., dir having pyproject.toml
# simple case : no cuda, default compiler
pip install -v .

# optiona: using a specific version of compiler (e.g., gcc-9 g++-9)
CMAKE_ARGS="-DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9" pip install -v .

# Option: with CUDA on
CMAKE_ARGS="-DCOMPILE_CUDA=ON" pip install .

# Option: with a specific version of cuda toolkit, e,g. cuda 11.5
CMAKE_ARGS="-DCOMPILE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5" pip install -v .

Tip	If CMAKE complains about missing C/C++ libraries, follow instructions at https://marian-nmt.github.io/quickstart/ and install them.

Models

The following reference-free/quality-estimation metrics are available:

chrfoid-wmt23
cometoid22-wmt23
cometoid22-wmt22
cometoid22-wmt21

Using Pymarian

CLI

pymarian-evaluate is a convenient CLI that downloads and caches models, and runs them on the input data.

# download sample data
sacrebleu -t wmt21/systems -l en-zh --echo src > src.txt
sacrebleu -t wmt21/systems -l en-zh --echo Online-B > mt.txt

# chrfoid
paste src.txt mt.txt | pymarian-evaluate --stdin -m chrfoid-wmt23
# cometoid22-wmt22; similarly for other models
paste src.txt mt.txt | pymarian-evaluate --stdin -m cometoid22-wmt22

Python API

# Evaluator
from pymarian import Evaluator
marian_args = '-m path/to/model.npz -v path/to.vocab.spm path/to.vocab.spm --like comet-qe'
evaluator = Evaluator(marian_args)

data = [
    ["Source1", "Hyp1"],
    ["Source2", "Hyp2"]
]
scores = evaluator.run(data)
for score in scores:
    print(score)

Using `marian`

Tip	The `pymarian-evaluate` downloads and caches models under `~/.cache/marian/metrics/<model>`

Models can ne downloaded from https://textmt.blob.core.windows.net/www/models/mt-metric/<model>.tgz, where <model> is one of the above (eg. cometoid22-wmt22). Download and extract the desired model to a directory, say $model_dir.

model_dir=~/.cache/marian/metrics/cometoid22-wmt22

# Score on CPU
N_CPU=6
paste src.txt mt.txt | marian evaluate --like comet-qe -m $model_dir/*.npz -v $model_dir/vocab.{spm,spm} \
        -w 8000 --quiet --cpu-threads $N_CPU --mini-batch 1 --maxi-batch 1

# Score on GPUs; here "--devices 0 1 2 3" means use 4 GPUs
paste src.txt mt.txt | marian evaluate --like comet-qe -m $model_dir/*.npz -v $model_dir/vocab.{spm,spm} \
        -w -4000 --quiet --devices 0 1 2 3 --mini-batch 16 --maxi-batch 1000

Reproduce Evaluation Results on WMT22

Step 1: Setup and obtain teset set

# install requirements; including mt-metrics-eval
pip install -r requirements.txt

# download metrics package
python -m mt_metrics_eval.mtme --download


# Flatten test data as TSV; this will be used in subsequent steps
# also this will help speedup evaluation
data_file=$(python -m evaluate -t wmt22 flatten --no-ref)

Step 2: Get Segment Level Scores

devices="0 1 2 3 4 5" # run on GPUs
n_segs=$(wc -l < $data_file)
for m in chrfoid-wmt23 cometoid22-wmt{21,22,23}; do
    out=${data_file%.tsv}.seg.score.$m;
    [[ -f $out._OK ]] && continue;
    rm -f $out; `# remove incomplete file, if any`
    cut -f4,6 $data_file | pymarian-evaluate --stdin -d $devices -m $m | tqdm --desc=$m --total=$n_segs > $out && touch $out._OK;
done

Step 3: Evaluate on WMT22 Metrics testset

# average seg scores to system scores; then evaluate
for m in chrfoid-wmt23 cometoid22-wmt{21,22,23}; do
    score_file=${data_file%.tsv}.seg.score.$m;
    [[ -f $score_file._OK ]] || { echo "Error: $scores_file._OK  not found"; continue;}
    python -m evaluate -t wmt22 full --no-ref --name $m --scores $score_file;
done

This produces results.csv

Step 4: Verify results

$ cat results.csv  | grep -E -i 'wmt22|cometoid|chrf|comet-22'
,wmt22.mqm_tab11,wmt22.da_sqm_tab8
*chrfoid-wmt23[noref],0.7773722627737226,0.8321299638989169
*cometoid22-wmt21[noref],0.7883211678832117,0.8483754512635379
*cometoid22-wmt22[noref],0.8065693430656934,0.8574007220216606
*cometoid22-wmt23[noref],0.8029197080291971,0.8592057761732852
COMET-22,0.8394160583941606,0.8393501805054152
MS-COMET-22,0.8284671532846716,0.8303249097472925
chrF,0.7335766423357665,0.7581227436823105

Citation

Please cite this paper: https://aclanthology.org/2023.wmt-1.62/

@inproceedings{gowda-etal-2023-cometoid,
    title = "Cometoid: Distilling Strong Reference-based Machine Translation Metrics into {E}ven Stronger Quality Estimation Metrics",
    author = "Gowda, Thamme  and
      Kocmi, Tom  and
      Junczys-Dowmunt, Marcin",
    editor = "Koehn, Philipp  and
      Haddon, Barry  and
      Kocmi, Tom  and
      Monz, Christof",
    booktitle = "Proceedings of the Eighth Conference on Machine Translation",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.wmt-1.62",
    pages = "751--755",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dataprep		dataprep
evaluate		evaluate
.gitignore		.gitignore
README.adoc		README.adoc
requirements.txt		requirements.txt
results.csv		results.csv
wmt-eval.sh		wmt-eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataprep

dataprep

evaluate

evaluate

.gitignore

.gitignore

README.adoc

README.adoc

requirements.txt

requirements.txt

results.csv

results.csv

wmt-eval.sh

wmt-eval.sh

Repository files navigation

WMT 2023 Metrics Shared Task Submission: Cometoid

Setup

Models

Using Pymarian

CLI

Python API

Using `marian`

Reproduce Evaluation Results on WMT22

Citation

About

Releases

Packages

Languages

marian-nmt/wmt23-metrics

Folders and files

Latest commit

History

Repository files navigation

WMT 2023 Metrics Shared Task Submission: Cometoid

About

Resources

Stars

Watchers

Forks

Languages