Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbnet corpus is out of date #2015

Open
agodbehere opened this issue May 5, 2018 · 13 comments
Open

Verbnet corpus is out of date #2015

agodbehere opened this issue May 5, 2018 · 13 comments

Comments

@agodbehere
Copy link

The nltk data index (https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml) points verbnet to version 2.1. The latest verbnet definition is 3.2.

The latest version has updated frame descriptions that provide much more information about the phrasal structure. For example, the primary description of a frame from class future_having-13.3 in the latest version is NP V NP-Dative NP, describing the frame's structure as (noun-phrase, verb, noun-phrase(dative), noun-phrase) while in version 2.1 it just reads Dative.

@stevenbird stevenbird self-assigned this May 6, 2018
stevenbird added a commit to nltk/nltk_data that referenced this issue May 14, 2018
@stevenbird
Copy link
Member

@agodbehere, thanks for reporting this issue. I've verified that the existing verbnet 2 corpus reader breaks on verbnet 3 data, so both will need to live alongside each other in the corpus collection.

The next step is for someone to contribute a corresponding corpus reader nltk.corpus.verbnet3, which can hopefully share some of the existing code.

We'll need to support both for a while.

@agodbehere
Copy link
Author

@stevenbird, what breaking case did you find for using the existing corpus reader with verbnet 2? I didn't run the test suite after updating the corpus, but for my use-case (requesting classids and frames), the existing corpus reader works just fine.

@stevenbird
Copy link
Member

stevenbird commented May 16, 2018 via email

@amosleokim
Copy link

@stevenbird @agodbehere Hi, I work on the VerbNet project at CU Boulder and would be happy to contribute and maintain code for a corpus reader for VerbNet 3+.

@stevenbird
Copy link
Member

@amosleokim: thanks, that would be welcome!

You can see that we have verbnet (2) and verbnet3 data here.

I propose we add an entry for verbnet3 here

And then work out how to extend verbnet.py to support both verbnet and verbnet3.

How does that sound? We need to support both simultaneously, and (ultimately) deprecate verbnet 2.

We have an NLTK slack channel where we can discuss details if necessary. Thanks!

@amosleokim
Copy link

@stevenbird That sounds good to me! If you can send me an invite code to the slack channel, I'll hop on so we can get started on the nitty gritty.

@guilherme-salome
Copy link

Any progress on this topic? I am trying to use verbnet for a research and the output I get from the classids methods seems weird.

@stevenbird
Copy link
Member

Please see #2015 (comment)

@guilherme-salome
Copy link

Thanks @stevenbird, the older version seemed to be the cause of the problem. I was able to manually download verbnet3.zip and read it with the reader for verbnet 2.1 that is in nltk.

@alvations
Copy link
Contributor

@salompas Just like to check again, does the verbnet API in NLTK work with verbnet3?

@guilherme-salome
Copy link

guilherme-salome commented Oct 19, 2018

@alvations
It does work for what I am using it for. Let me show you my code:

import nltk
v3 = nltk.corpus.util.LazyCorpusLoader(
    'verbnet3', nltk.corpus.reader.verbnet.VerbnetCorpusReader,
    r'(?!\.).*\.xml')
v3.classids('add') # returns ['mix-22.1-2', 'multiply-108', 'say-37.7-1']

For that to work you need to download verbnet3 from here. Unzip this file in the folder ~/nltk_data/corpora~. When unzipped it should create a new folder ~/nltk_data/corpora/verbnet3which contains all the Verbnet3 definitions. Then you should be able to run the code above. Notice that for Verbnet 2 (the default) the codev3.classids('add')` only returns the first class (mix-22.1-2).

Since that is basically all I am using Verbnet3 for I have not tested the other APIs, but the classids method has been tested on maaany different words and they all work. I hope this helps!

@stevenbird stevenbird added this to the 3.5 milestone Aug 13, 2019
@sonhkim
Copy link

sonhkim commented Mar 30, 2020

@Salompas Hi, thank you for your solution! What version of verbnet3 is your 'verbnet3'? Is it version 3.3 or 3.2?

@guilherme-salome
Copy link

guilherme-salome commented Mar 30, 2020

@Salompas Hi, thank you for your solution! What version of verbnet3 is your 'verbnet3'? Is it version 3.3 or 3.2?

Hey @songhee-kim, it's been 2 years since I worked on this, so I do not know exactly which version I had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants