MaUde

Metric for automatic Unreferenced dialog evaluation.

Contains code of the paper titled "Learning an Unreferenced Metric for Online Dialogue Evaluation" to appear at ACL 2020, Arxiv

Installation

pip install -r requirements.txt
Install ParlAI

Getting the data

Get the convai2 train and test data and pre-trained Distilbert embeddings here. Download and unzip in the folder convai2_data.
Get the trained model checkpoints from here. Download and unzip into the folder full_acl_runs.
For individual licensing reasons we cannot release the train/test data of MultiWoz, Frames and DailyDialog. Please send me a mail if you need them!
Run inference using ./run_inference.sh

N.B. - For model names and checkpoints, please refer to run_inference.sh script.

Computing Backtranslation

We use FairSeq to compute back-translations. Our modified scripts are present in scripts folder, to run cd into that folder and run ./run_trans.sh.

Computing Corruption Files

In the data dump we already provide the corruption files used for training. To generate new corruption files on the dataset, use scripts/compute_corrupt.py.

Training Script

Uses Pytorch Lightning as the boilerplate for reproducibility.

python codes/trainer.py --mode train \
    --corrupt_type all \ 
    --batch_size 64 \
    --model_save_dir /checkpoint/koustuvs/dialog_metric/all_dailydialog_10_runs \
    --learn_down True --downsample True --down_dim 300 \
    --optim adam,lr=0.001 --dropout 0.2 --decoder_hidden 200 \ 
    --data_name convai2 \ 
    --data_loc /checkpoint/koustuvs/dialog_metric/convai2_data/ \
    --use_cluster

For baselines, add the appropriate flag:

--train_baseline [infersent/ruber/bertnli]

An example training script is provided at run_training.sh

Inference Script

# CUDA_VISIBLE_DEVICES=0 python codes/inference.py \ 
    --id $MODEL_ID --model_save_dir $MODEL_SAVE_DIR \
    --model_version $VERSION --train_mode nce \ 
    --corrupt_pre $DATA_LOC --test_suffix true_response \ 
    --test_column true_response --results_file "results.jsonl"

Outputs the results in a jsonl file. To measure human correaltion with See et al 2019, specify --human_eval flag and --human_eval_file location.
We have also added the script to run inference on our trained checkpoints - run_inference.sh.

Acknowledgements

Pytorch Lightning - https://github.com/williamFalcon/pytorch-lightning
HuggingFace Transformers - https://github.com/huggingface/transformers
FairSeq - https://github.com/pytorch/fairseq
Liming Vie's RUBER implementation - https://github.com/liming-vie/RUBER
Pytorch 1.2.0 - https://pytorch.org/
ParlAI - https://parl.ai/
See et al 2019 data - https://parl.ai/projects/controllable_dialogue/

Questions

Please send a mail to koustuv.sinha@mail.mcgill.ca for questions / clarifications.
Open an Issue

Citation

If our work is useful for your research, consider citing it using the following bibtex:

@article{sinha2020maude,
  Author = {Koustuv Sinha and Prasanna Parthasarathi and Jasmine Wang and Ryan Lowe and William L. Hamilton and Joelle Pineau},
  Title = {Learning an Unreferenced Metric for Online Dialogue Evaluation},
  Year = {2020},
  journal = {ACL},
  arxiv = {2005.00583},
  url = {https://arxiv.org/abs/2005.00583}
}

License

This work is CC-BY-NC 4.0 (Attr Non-Commercial Inter.) licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backup		backup
codes		codes
configs		configs
logbook		logbook
logs		logs
output_data		output_data
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
args.py		args.py
data.py		data.py
human_correlation.ipynb		human_correlation.ipynb
requirements.txt		requirements.txt
run_inference.sh		run_inference.sh
run_training.sh		run_training.sh
sweep.json		sweep.json
utils.py		utils.py

License

facebookresearch/online_dialog_eval

Folders and files

Latest commit

History

Repository files navigation

MaUde

Installation

Getting the data

Computing Backtranslation

Computing Corruption Files

Training Script

Inference Script

Acknowledgements

Questions

Citation

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages