New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nltk.translate.bleu_score gives false result when ngram larger than maximum ngrams of given sentence #1539
Comments
Which version of the code are you using?
The BLEU implementation has been just recently fixed with #1330 resolved. If you're using the >>> import nltk
>>> from nltk import bleu
>>> ref = hyp = 'abc'
>>> bleu([ref], hyp)
1.0
>>> from nltk import bleu
>>> ref, hyp = 'abc', 'abd'
>>> bleu([ref], hyp)
0.7598356856515925 Since a string is a list of chars and >>> from nltk.translate.bleu_score import sentence_bleu
>>> sentence_bleu([['a', 'b', 'c']], ['a', 'b', 'c'])
1.0
>>> sentence_bleu([['a', 'b', 'c']], ['a', 'b', 'd'])
0.7598356856515925 To install the latest
(Do note that the develop branch is subjected to more unexpected bugs and it is recommended that users install the On a related note but not directly involved with the current |
Thanks @alvations . The original version of nltk I used was 3.2. I have updated it to 3.2.1 now and it's now raising ZeroDivisionError. And I used Python 3.5.2 |
The only stable version of BLEU is in the |
OK. I will wait. But in the case you mentioned above, if the weight is [0.25, 0.25, 0.25, 0.25], the results of sentence_bleu([['a', 'b', 'c']], ['a', 'b', 'c']) and sentence_bleu([['a', 'b', 'c']], ['a', 'b', 'd']) should both be 0, according to the original paper |
The original paper didn't account for the fact that If we look at the formula in Section 2.3, it takes the So if we were to implement the original BLEU, the user should receive a warning that says something like "BLEU can't be computed" whenever there is a the math domain error. So the later versions of BLEU tries to fix it with several different hacks, the history of the versions can be found on https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v13a.pl#L17 Please note that the latest rendition of BLEU comes with the smoothing functions from Chen and Cherry (2014) paper is not in the Moses version of I hope the explanation helps. |
Given weight = [0.25, 0.25, 0.25, 0.25] (default value),
sentence_bleu([['a', 'b', 'c']], ['a', 'b', 'c']) = 0
While sentence_bleu([['a', 'b', 'c']], ['a', 'b', 'd']) = 0.7598
Obviously the previous score should be larger than the latter, or both scores should be 0
The text was updated successfully, but these errors were encountered: