Updating triples.py to rely on spacy-wordnet and sense tagging #370

dzitney1 · 2023-03-12T18:21:43Z

Context

Thank you for all of the work you have put into this library, it has helped me immensely! While I am inexperienced at submitting pull requests on major repos and blissfully ignorant of what goes into the compatibility testing and proper generation of documentation I did not want to let that deter me from submitting something.

For quote attribution the triples.py currently relies on constants.REPORTING_VERBS. The comment on line 201 of triples.py shows interest in implementing a model to perform this functionality.

Proposed solution

This solution would rely on an additional dependency spacy-wordnet which in turn relies on nltk.
If instead of string matching lemmas to reporting verbs it may be possible to access the sense tagging from the wordnet corpus and use this method instead.

Beyond the additional support required for the increased dependencies I have found the following solution (which requires two changes) to work for me.

When calling make_spacy_doc

# en_core_web_trf is not required here, works with en_core_web_sm
nlp = spacy.load('en_core_web_trf')

# The following could possibly be implemented with some sort of config option in core.py
# or added by the user in their own function
nlp.add_pipe("spacy_wordnet", after='tagger')
doc = textacy.make_spacy_doc(text, lang=nlp)

Updating lines 253 and 254 of triples.py
From

tok.pos == VERB
                and tok.lemma_ in _reporting_verbs

To

tok.pos == VERB and tok._.wordnet.lemmas()
                and tok._.wordnet.lemmas()[0]._synset._lexname == 'verb.communication'

The text was updated successfully, but these errors were encountered:

dzitney1 added the enhancement label Mar 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating triples.py to rely on spacy-wordnet and sense tagging #370

Updating triples.py to rely on spacy-wordnet and sense tagging #370

dzitney1 commented Mar 12, 2023

Updating triples.py to rely on spacy-wordnet and sense tagging #370

Updating triples.py to rely on spacy-wordnet and sense tagging #370

Comments

dzitney1 commented Mar 12, 2023

Context

Proposed solution