Weird result using NgramModel.prob() function. #380

aliannejadi · 2013-04-06T15:19:50Z

In the code below:

from nltk.corpus import reuters
from nltk.probability import LidstoneProbDist
from nltk.model import NgramModel

def main():
    tokens = list(reuters.words())
    estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
    model = NgramModel(3, tokens, True, False, estimator)

    print model.prob('said', [''])

if __name__ == '__main__':
    main()

The resulted outout is: 3.03175480769 which is impossible since it is a probability and should be below 1. I have encountered many same issues when checking the probabilities of stopwords more often.

Thanks.
MOLi

The text was updated successfully, but these errors were encountered:

bcroy · 2013-04-06T21:10:41Z

Hello -

I would not be surprised if this is due to the issue I raised here: #367. That issue is with the backoff calculation, and perhaps you can confirm that when you call model.prob('said', ['']) the model is backing off to a n=2 or even n=1 model. By the way, what happens when you call model.prob('said', ['',''])? Given that if you are building a trigram model, I think the context should be a bigram, not a unigram as you have it in the example...

aliannejadi · 2013-04-07T10:45:55Z

Hello,

Thanks for mentioning issue #367. That's right, it's the same but the other comments you made on it don't seem to be correct. In the same code, check this:

print model.prob('in', ['areas', 'will'])

The output is: 3.18247596154

…167, #367, #380, #388, #396, #602, #603; package saved in a new branch

…ltk#157, nltk#167, nltk#367, nltk#380, nltk#388, nltk#396, nltk#602, nltk#603; package saved in a new branch

stevenbird · 2016-11-20T12:57:13Z

@copper-head would you please confirm if this issue is current?

iliakur · 2018-08-25T06:41:29Z

@stevenbird This can now be closed, we have a regression test for it.

awahl1 mentioned this issue Apr 13, 2013

Another issue with NgramModel.prob() #388

Closed

kmike mentioned this issue Feb 11, 2014

Serious problem with SimpleGoodTuringProbDist #602

Closed

stevenbird added a commit that referenced this issue Feb 20, 2014

removed ngram model package, pending resolution of issues #133, #157, #…

73f7e7b

…167, #367, #380, #388, #396, #602, #603; package saved in a new branch

stevenbird added the model label Feb 20, 2014

inteldict pushed a commit to inteldict/nltk that referenced this issue Jul 15, 2015

removed ngram model package, pending resolution of issues nltk#133, n…

052b5ab

…ltk#157, nltk#167, nltk#367, nltk#380, nltk#388, nltk#396, nltk#602, nltk#603; package saved in a new branch

iliakur mentioned this issue Aug 5, 2018

Language Modeling #2077

Merged

alvations added the resolved label Aug 25, 2018

stevenbird closed this as completed Aug 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird result using NgramModel.prob() function. #380

Weird result using NgramModel.prob() function. #380

aliannejadi commented Apr 6, 2013

bcroy commented Apr 6, 2013

aliannejadi commented Apr 7, 2013

stevenbird commented Nov 20, 2016

iliakur commented Aug 25, 2018

Weird result using NgramModel.prob() function. #380

Weird result using NgramModel.prob() function. #380

Comments

aliannejadi commented Apr 6, 2013

bcroy commented Apr 6, 2013

aliannejadi commented Apr 7, 2013

stevenbird commented Nov 20, 2016

iliakur commented Aug 25, 2018