Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

facebookresearch/EditEval

The Instruction-Based Benchmark for Text Improvements

The EditEval benchmark is described in the following paper: https://arxiv.org/abs/2209.13331

@inproceedings{dwivedi-edit-2022,
  doi = {10.48550/ARXIV.2209.13331},
  url = {https://arxiv.org/abs/2209.13331},
  author = {Dwivedi-Yu, Jane and Schick, Timo and Jiang, Zhengbao and Lomeli, Maria and Lewis, Patrick and Izacard, Gautier and Grave, Edouard and Riedel, Sebastian and Petroni, Fabio},
  keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {EditEval: An Instruction-Based Benchmark for Text Improvements},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}

Leaderboard

The leaderboard for this benchmark can be found on EvalAI.

Installation

conda create -n editeval -y python=3.7 && conda activate editeval
pip install -e .

Additional dependencies

The FRUIT dataset requires that you install gsutil.

Downloading datasets

This will download to the directory /data. To specify a different output directory use output_directory={path_to_output_dir}.

For a single dataset run:

python main.py --dataset_name {dataset_name}

For all datasets run:

python main.py --dataset_name all

Writing datasets to jsonl files

For a single dataset run:

python main.py --dataset_name {dataset_name} --write_to_jsonl

For all datasets run:

python main.py --dataset_name all --write_to_jsonl

Sampling datasets

python main.py --dataset_name jfleg --sample {num_examples_to_sample}

Running evaluation for a dataset

python main.py --dataset_name {dataset_name}  --prediction_file {path_to_jsonl}

To specify certain metrics (e.g., gleu and sari):

python main.py --dataset_name {dataset_name}  --prediction_file {path_to_jsonl} --metrics gleu sari

To turn off normalization during evaluation, specify --no_normalization.

Current tasks and datasets

  • Fluency
    • jfleg
    • iterater_fluency
  • Clarity
    • iterater_clarity
  • Coherence
    • iterater_coherence
  • Paraphrasing
    • stsb_multi_mt
  • Simplification
    • turk
    • asset
  • Neutralization
    • wnc
  • Updating
    • fruit
    • wafer_insert

Current metrics

  • sari
  • em
  • em_diff
  • bleu
  • ibleu
  • gleu
  • rouge
  • update_rouge
  • bert_score

Licensing

See our LICENSE file for licensing details.

About

An instruction-based benchmark for text improvements.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages