-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reference to entropy implementation used #3229
Conversation
Making it more clear that the entropy implementation in NLTK is the one based on the Shannon-McMillan-Breiman theorem, as used and referenced by Jurafsky and Jordan Boyd-Graber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits, otherwise lgtm, thanks!
Co-authored-by: Ilia Kurenkov <ilia.kurenkov@gmail.com>
Co-authored-by: Ilia Kurenkov <ilia.kurenkov@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes in formatting of text. Look's good!
add entropy_extended with length normalisation and relative frequency weighting. Also adapt perplexity_extended to work with this new entropy measure.
@iliakur yeah, so a bit messy here, but I've reverted the PR to just the update to the docs. I now have a separate branch on my fork that I'll be getting ready for a PR for an alternative entropy/perplexity method, including tests :) |
@stevenbird @tomaarsen this PR is ready to merge. I've been away from nltk development long enough that I figured I'd check in with you two about any formalities that may be missing. |
I'm afraid I've also not had much time for NLTK recently. That said, I think this is good to go. Thanks for taking care of this, and nice to see you here again @mbauwens.
|
This is in reference to issue #2961.
This is just a simple addition to the docs, making it more clear that the entropy implementation in NLTK is the one based on the Shannon-McMillan-Breiman theorem, as used and referenced by Jurafsky (Chapter 3 page 23 - 3.49) and Jordan Boyd-Graber (
lm.txt
file in this comment).This PR does not include the alternative implementation of the entropy function that was proposed in the issue.