Issue about BLEU smoothing method 4 #2676

OpenSource-fan · 2021-03-24T01:22:16Z

NLTK3.5 implements smoothing method 4 proposed in the paper [https://www.aclweb.org/anthology/W14-3346/ ].
However, we suspect that the implementation is incorrect. The code at

nltk/nltk/translate/bleu_score.py

Lines 582 to 585 in d0f54c2

    
           incvnt = i + 1 * self.k / math.log( 
        
               hyp_len 
        
           )  # Note that this K is different from the K from NIST. 
        
           p_n[i] = incvnt / p_i.denominator

is problematic.
This leads to the p_n[i] greater than 100% in some case. When hyp_len < 4 in 4-gram, p_n[i] can be assigned with a percentage number that is much greater than 100% (or even > 500%).
In fact, the p_n[i] should not be greater than 100%.
The correct implementation is p_n[i] = (1 / incvnt) / p_i.denominator in line585.

stevenbird · 2021-03-28T12:11:26Z

Thanks @anonymityBoy. Would you please submit a pull request?

stevenbird · 2021-03-28T22:10:49Z

Thanks @anonymityBoy... can I trouble you to add a regression test or two here please?

OpenSource-fan · 2021-03-29T02:04:51Z

Sure. I have added.

stevenbird self-assigned this Mar 28, 2021

OpenSource-fan mentioned this issue Mar 28, 2021

fix bug in calculating BLEU using smoothing 4 #2681

Merged

stevenbird closed this as completed Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue about BLEU smoothing method 4 #2676

Issue about BLEU smoothing method 4 #2676

OpenSource-fan commented Mar 24, 2021

stevenbird commented Mar 28, 2021

stevenbird commented Mar 28, 2021

OpenSource-fan commented Mar 29, 2021

Issue about BLEU smoothing method 4 #2676

Issue about BLEU smoothing method 4 #2676

Comments

OpenSource-fan commented Mar 24, 2021

stevenbird commented Mar 28, 2021

stevenbird commented Mar 28, 2021

OpenSource-fan commented Mar 29, 2021