Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird result using NgramModel.prob() function. #380

Closed
aliannejadi opened this issue Apr 6, 2013 · 4 comments
Closed

Weird result using NgramModel.prob() function. #380

aliannejadi opened this issue Apr 6, 2013 · 4 comments

Comments

@aliannejadi
Copy link

In the code below:

from nltk.corpus import reuters
from nltk.probability import LidstoneProbDist
from nltk.model import NgramModel

def main():
    tokens = list(reuters.words())
    estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
    model = NgramModel(3, tokens, True, False, estimator)

    print model.prob('said', [''])

if __name__ == '__main__':
    main()

The resulted outout is: 3.03175480769 which is impossible since it is a probability and should be below 1. I have encountered many same issues when checking the probabilities of stopwords more often.

Thanks.
MOLi

@bcroy
Copy link

bcroy commented Apr 6, 2013

Hello -

I would not be surprised if this is due to the issue I raised here: #367. That issue is with the backoff calculation, and perhaps you can confirm that when you call model.prob('said', ['']) the model is backing off to a n=2 or even n=1 model. By the way, what happens when you call model.prob('said', ['',''])? Given that if you are building a trigram model, I think the context should be a bigram, not a unigram as you have it in the example...

@aliannejadi
Copy link
Author

Hello,

Thanks for mentioning issue #367. That's right, it's the same but the other comments you made on it don't seem to be correct. In the same code, check this:

print model.prob('in', ['areas', 'will'])

The output is: 3.18247596154

@stevenbird
Copy link
Member

@copper-head would you please confirm if this issue is current?

@iliakur
Copy link
Contributor

iliakur commented Aug 25, 2018

@stevenbird This can now be closed, we have a regression test for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants