Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in sentence level BLEU comparison #114

Open
1 task
madaan opened this issue Sep 18, 2019 · 1 comment
Open
1 task

Bug in sentence level BLEU comparison #114

madaan opened this issue Sep 18, 2019 · 1 comment

Comments

@madaan
Copy link
Contributor

madaan commented Sep 18, 2019

Description

The report

N sentences where Sys A > Sys B at sentence-level BLEU

will generate wrong output if:

a) Sys A never generates sentences that have a higher BLEU score

b) There are less than N sentences in the set of sentences to be analyzed

Screenshots

Screen Shot 2019-09-18 at 1 21 40 AM

Files

print(f'--- {report_length} sentences where {sright}>{sleft} at {self.scorer.name()}')

To Reproduce

Use the SysA, SysB and Ref outputs located at https://gist.github.com/madaan/2cec36a7b18dfeea3904ddfff1e19312 and run compare-mt with the default options.

Tasks

I can take a stab at it if you guys think this should be fixed.

Thanks!

@neubig
Copy link
Contributor

neubig commented Sep 18, 2019

Thanks a lot! I think it's probably better to just change the message however, from `{sright}>{sleft}' to something indicating that this is just the maximum difference (which could also be negative). PR would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants