Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chomsky_normal_form() for grammars #1884

Open
DavidNemeskey opened this issue Nov 13, 2017 · 6 comments
Open

chomsky_normal_form() for grammars #1884

DavidNemeskey opened this issue Nov 13, 2017 · 6 comments

Comments

@DavidNemeskey
Copy link

nltk.tree.Tree has a chomsky_normal_form() function, but grammars don't. Since CNF is a form of the grammar, it should, also.

@alvations
Copy link
Contributor

The chomsky_normal_form() in NLTK is a tree-binarization function. I think it can't be directly applied to grammars, see https://github.com/nltk/nltk/blob/develop/nltk/treetransforms.py

Grammar transformation to CNF is rather complex and hasn't yet been implemented. It would be good if there's an attempt at it, but it might not be trival.

If anyone is interested in contributing, a good algorithm to start out is

  • Lange and Leiß (2009). "To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm"

@virresh
Copy link
Contributor

virresh commented Dec 6, 2018

I was looking around for exactly this and stumbled here,
I've created this functionality by exploiting nltk internals for one of my projects (and a CKY parser on top of it as well, although without null handling, but I believe its useful for a lot of grammars anyways, for eg the ATIS test grammar), would be happy to send a PR, just wanted to confirm if this issue still holds and is available to work on (especially after the last comments in #1722)

Also if there's no CYK parser as mentioned in the other issue, I'd be happy to send a PR for that as well

@aetilley
Copy link
Contributor

@virresh @alvations Sorry for letting #1722 get stale. Life has gotten in the way.

@virresh If you'd like to take a wack at this be my guest. I wrote a conversion to CNF years ago.

https://github.com/aetilley/pcfg/blob/master/src/pcfg.py#L524

I remember it reordering the steps in the usual algorithm but I also remember convincing myself that they were equivalent. Proceed with caution.

I'm going to close the other issue.

virresh added a commit to virresh/nltk that referenced this issue Mar 31, 2019
virresh added a commit to virresh/nltk that referenced this issue Mar 31, 2019
virresh added a commit to virresh/nltk that referenced this issue Mar 31, 2019
virresh added a commit to virresh/nltk that referenced this issue Apr 4, 2019
@Daksh
Copy link
Contributor

Daksh commented Oct 24, 2019

There is a problem with CNFs for CFGs; they are returning duplicated productions. You can test this by:

import nltk
grammar = nltk.data.load("grammars/large_grammars/atis.cfg")
grammar = grammar.chomsky_normal_form()
print(len(grammar.productions()))
print(len(list(set(grammar.productions()))))

grammar has 20344 productions, which when converted to a set give 12396 productions.

@stefkauf
Copy link

I have just stumbled upon this thread. I have written my own version of the CFG.chomsky_normal_form() method because the one present was incomplete (cannot deal with empty productions, etc.). Mine works. Also simplifies the grammar if possible. I'd be happy to contribute it. I haven't contributed to NLTK before. How does this work?

@tomaarsen
Copy link
Member

@stefkauf Information on how to contribute can be found in CONTRIBUTING.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants