chomsky_normal_form() for grammars #1884

DavidNemeskey · 2017-11-13T15:35:45Z

nltk.tree.Tree has a chomsky_normal_form() function, but grammars don't. Since CNF is a form of the grammar, it should, also.

The text was updated successfully, but these errors were encountered:

alvations · 2017-11-14T02:39:09Z

The chomsky_normal_form() in NLTK is a tree-binarization function. I think it can't be directly applied to grammars, see https://github.com/nltk/nltk/blob/develop/nltk/treetransforms.py

Grammar transformation to CNF is rather complex and hasn't yet been implemented. It would be good if there's an attempt at it, but it might not be trival.

If anyone is interested in contributing, a good algorithm to start out is

Lange and Leiß (2009). "To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm"

virresh · 2018-12-06T03:30:24Z

I was looking around for exactly this and stumbled here,
I've created this functionality by exploiting nltk internals for one of my projects (and a CKY parser on top of it as well, although without null handling, but I believe its useful for a lot of grammars anyways, for eg the ATIS test grammar), would be happy to send a PR, just wanted to confirm if this issue still holds and is available to work on (especially after the last comments in #1722)

Also if there's no CYK parser as mentioned in the other issue, I'd be happy to send a PR for that as well

aetilley · 2019-03-23T01:47:17Z

@virresh @alvations Sorry for letting #1722 get stale. Life has gotten in the way.

@virresh If you'd like to take a wack at this be my guest. I wrote a conversion to CNF years ago.

https://github.com/aetilley/pcfg/blob/master/src/pcfg.py#L524

I remember it reordering the steps in the usual algorithm but I also remember convincing myself that they were equivalent. Proceed with caution.

I'm going to close the other issue.

nltk#1884

Daksh · 2019-10-24T18:58:19Z

There is a problem with CNFs for CFGs; they are returning duplicated productions. You can test this by:

import nltk
grammar = nltk.data.load("grammars/large_grammars/atis.cfg")
grammar = grammar.chomsky_normal_form()
print(len(grammar.productions()))
print(len(list(set(grammar.productions()))))

grammar has 20344 productions, which when converted to a set give 12396 productions.

stefkauf · 2021-11-11T01:09:58Z

I have just stumbled upon this thread. I have written my own version of the CFG.chomsky_normal_form() method because the one present was incomplete (cannot deal with empty productions, etc.). Mine works. Also simplifies the grammar if possible. I'd be happy to contribute it. I haven't contributed to NLTK before. How does this work?

tomaarsen · 2021-11-11T06:32:44Z

@stefkauf Information on how to contribute can be found in CONTRIBUTING.md.

alvations added enhancement nice idea parsing labels Nov 14, 2017

aetilley mentioned this issue Mar 23, 2019

CKY Algorithm? #1722

Closed

virresh added a commit to virresh/nltk that referenced this issue Mar 31, 2019

Add chomsky_normal_form for CFGs

6c0cab8

nltk#1884

virresh added a commit to virresh/nltk that referenced this issue Mar 31, 2019

Add chomsky_normal_form for CFGs

625b831

nltk#1884

virresh added a commit to virresh/nltk that referenced this issue Mar 31, 2019

Add chomsky_normal_form for CFGs

5934fc8

nltk#1884

virresh mentioned this issue Mar 31, 2019

Add chomsky_normal_form for CFGs #2260

Merged

virresh added a commit to virresh/nltk that referenced this issue Apr 4, 2019

Add chomsky_normal_form for CFGs

b4bd1fa

nltk#1884

Daksh mentioned this issue Oct 24, 2019

Remove duplicate productions in CNF for CFGs #2435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chomsky_normal_form() for grammars #1884

chomsky_normal_form() for grammars #1884

DavidNemeskey commented Nov 13, 2017

alvations commented Nov 14, 2017

virresh commented Dec 6, 2018 •

edited

aetilley commented Mar 23, 2019

Daksh commented Oct 24, 2019

stefkauf commented Nov 11, 2021

tomaarsen commented Nov 11, 2021

chomsky_normal_form() for grammars #1884

chomsky_normal_form() for grammars #1884

Comments

DavidNemeskey commented Nov 13, 2017

alvations commented Nov 14, 2017

virresh commented Dec 6, 2018 • edited

aetilley commented Mar 23, 2019

Daksh commented Oct 24, 2019

stefkauf commented Nov 11, 2021

tomaarsen commented Nov 11, 2021

virresh commented Dec 6, 2018 •

edited