Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating triples.py to rely on spacy-wordnet and sense tagging #370

Open
dzitney1 opened this issue Mar 12, 2023 · 0 comments
Open

Updating triples.py to rely on spacy-wordnet and sense tagging #370

dzitney1 opened this issue Mar 12, 2023 · 0 comments

Comments

@dzitney1
Copy link

Context

Thank you for all of the work you have put into this library, it has helped me immensely! While I am inexperienced at submitting pull requests on major repos and blissfully ignorant of what goes into the compatibility testing and proper generation of documentation I did not want to let that deter me from submitting something.

For quote attribution the triples.py currently relies on constants.REPORTING_VERBS. The comment on line 201 of triples.py shows interest in implementing a model to perform this functionality.

Proposed solution

This solution would rely on an additional dependency spacy-wordnet which in turn relies on nltk.
If instead of string matching lemmas to reporting verbs it may be possible to access the sense tagging from the wordnet corpus and use this method instead.

Beyond the additional support required for the increased dependencies I have found the following solution (which requires two changes) to work for me.

  1. When calling make_spacy_doc
# en_core_web_trf is not required here, works with en_core_web_sm
nlp = spacy.load('en_core_web_trf')

# The following could possibly be implemented with some sort of config option in core.py
# or added by the user in their own function
nlp.add_pipe("spacy_wordnet", after='tagger')
doc = textacy.make_spacy_doc(text, lang=nlp)
  1. Updating lines 253 and 254 of triples.py
    From
tok.pos == VERB
                and tok.lemma_ in _reporting_verbs

To

tok.pos == VERB and tok._.wordnet.lemmas()
                and tok._.wordnet.lemmas()[0]._synset._lexname == 'verb.communication'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant