Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 880 Bytes

README.md

File metadata and controls

13 lines (8 loc) · 880 Bytes

EMNLP 2017 submission

This repository contains the dataset and statistical analysis code released with the submission of EMNLP 2017 paper "Why We Need New Evaluation Metrics for NLG".

File descriptions:

  • emnlp_data_individual_hum_scores.csv - the dataset with system outputs and evaluation ratings of 3 crowd-workers for each output
  • emnlp_data_individual_hum_scores.csv - the dataset with system outputs, original human references, scores of automatic metrics and medians of human ratings
  • analysis_emnlp.R - R code with statistical analysis discussed in the paper

Citing the paper:

Jekaterina Novikova, Ondrej Dusek, Amanda Cercas-Curry and Verena Rieser (2017): Why We Need New Evaluation Metrics for NLG. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP 2017, Copenhaged, Denmark