The Instruction-Based Benchmark for Text Improvements

The EditEval benchmark is described in the following paper: https://arxiv.org/abs/2209.13331

@inproceedings{dwivedi-edit-2022,
  doi = {10.48550/ARXIV.2209.13331},
  url = {https://arxiv.org/abs/2209.13331},
  author = {Dwivedi-Yu, Jane and Schick, Timo and Jiang, Zhengbao and Lomeli, Maria and Lewis, Patrick and Izacard, Gautier and Grave, Edouard and Riedel, Sebastian and Petroni, Fabio},
  keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {EditEval: An Instruction-Based Benchmark for Text Improvements},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}

Leaderboard

The leaderboard for this benchmark can be found on EvalAI.

Installation

conda create -n editeval -y python=3.7 && conda activate editeval
pip install -e .

Additional dependencies

The FRUIT dataset requires that you install gsutil.

Downloading datasets

This will download to the directory /data. To specify a different output directory use output_directory={path_to_output_dir}.

For a single dataset run:

python main.py --dataset_name {dataset_name}

For all datasets run:

python main.py --dataset_name all

Writing datasets to jsonl files

For a single dataset run:

python main.py --dataset_name {dataset_name} --write_to_jsonl

For all datasets run:

python main.py --dataset_name all --write_to_jsonl

Sampling datasets

python main.py --dataset_name jfleg --sample {num_examples_to_sample}

Running evaluation for a dataset

python main.py --dataset_name {dataset_name}  --prediction_file {path_to_jsonl}

To specify certain metrics (e.g., gleu and sari):

python main.py --dataset_name {dataset_name}  --prediction_file {path_to_jsonl} --metrics gleu sari

To turn off normalization during evaluation, specify --no_normalization.

Current tasks and datasets

Fluency
- jfleg
- iterater_fluency
Clarity
- iterater_clarity
Coherence
- iterater_coherence
Paraphrasing
- stsb_multi_mt
Simplification
- turk
- asset
Neutralization
- wnc
Updating
- fruit
- wafer_insert

Current metrics

sari
em
em_diff
bleu
ibleu
gleu
rouge
update_rouge
bert_score

Licensing

See our LICENSE file for licensing details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

src

src

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

logo.png

logo.png

main.py

main.py

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

The Instruction-Based Benchmark for Text Improvements

Leaderboard

Installation

Additional dependencies

Downloading datasets

Writing datasets to jsonl files

Sampling datasets

Running evaluation for a dataset

Current tasks and datasets

Current metrics

Licensing

About

Releases

Packages

Languages

License

facebookresearch/EditEval

Folders and files

Latest commit

History

Repository files navigation

The Instruction-Based Benchmark for Text Improvements

Leaderboard

Installation

Additional dependencies

Downloading datasets

Writing datasets to jsonl files

Sampling datasets

Running evaluation for a dataset

Current tasks and datasets

Current metrics

Licensing

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages