Bug in sentence level BLEU comparison #114

madaan · 2019-09-18T05:26:24Z

Description

The report

N sentences where Sys A > Sys B at sentence-level BLEU

will generate wrong output if:

a) Sys A never generates sentences that have a higher BLEU score

b) There are less than N sentences in the set of sentences to be analyzed

Screenshots

Files

compare-mt/compare_mt/reporters.py

Line 507 in 41700d6

    
           print(f'--- {report_length} sentences where {sright}>{sleft} at {self.scorer.name()}')

To Reproduce

Use the SysA, SysB and Ref outputs located at https://gist.github.com/madaan/2cec36a7b18dfeea3904ddfff1e19312 and run compare-mt with the default options.

Tasks

Add checks before emitting the report at:

compare-mt/compare_mt/reporters.py

Line 507 in 41700d6

print(f'--- {report_length} sentences where {sright}>{sleft} at {self.scorer.name()}')

I can take a stab at it if you guys think this should be fixed.

Thanks!

neubig · 2019-09-18T09:37:36Z

Thanks a lot! I think it's probably better to just change the message however, from `{sright}>{sleft}' to something indicating that this is just the maximum difference (which could also be negative). PR would be welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in sentence level BLEU comparison #114

Bug in sentence level BLEU comparison #114

madaan commented Sep 18, 2019 •

edited by pfliu-nlp

neubig commented Sep 18, 2019

Bug in sentence level BLEU comparison #114

Bug in sentence level BLEU comparison #114

Comments

madaan commented Sep 18, 2019 • edited by pfliu-nlp

Description

Screenshots

Files

To Reproduce

Tasks

neubig commented Sep 18, 2019

madaan commented Sep 18, 2019 •

edited by pfliu-nlp