You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for all of the work you have put into this library, it has helped me immensely! While I am inexperienced at submitting pull requests on major repos and blissfully ignorant of what goes into the compatibility testing and proper generation of documentation I did not want to let that deter me from submitting something.
For quote attribution the triples.py currently relies on constants.REPORTING_VERBS. The comment on line 201 of triples.py shows interest in implementing a model to perform this functionality.
Proposed solution
This solution would rely on an additional dependency spacy-wordnet which in turn relies on nltk.
If instead of string matching lemmas to reporting verbs it may be possible to access the sense tagging from the wordnet corpus and use this method instead.
Beyond the additional support required for the increased dependencies I have found the following solution (which requires two changes) to work for me.
When calling make_spacy_doc
# en_core_web_trf is not required here, works with en_core_web_sm
nlp = spacy.load('en_core_web_trf')
# The following could possibly be implemented with some sort of config option in core.py
# or added by the user in their own function
nlp.add_pipe("spacy_wordnet", after='tagger')
doc = textacy.make_spacy_doc(text, lang=nlp)
Updating lines 253 and 254 of triples.py
From
tok.pos == VERB
and tok.lemma_ in _reporting_verbs
To
tok.pos == VERB and tok._.wordnet.lemmas()
and tok._.wordnet.lemmas()[0]._synset._lexname == 'verb.communication'
The text was updated successfully, but these errors were encountered:
Context
Thank you for all of the work you have put into this library, it has helped me immensely! While I am inexperienced at submitting pull requests on major repos and blissfully ignorant of what goes into the compatibility testing and proper generation of documentation I did not want to let that deter me from submitting something.
For quote attribution the triples.py currently relies on constants.REPORTING_VERBS. The comment on line 201 of triples.py shows interest in implementing a model to perform this functionality.
Proposed solution
This solution would rely on an additional dependency spacy-wordnet which in turn relies on nltk.
If instead of string matching lemmas to reporting verbs it may be possible to access the sense tagging from the wordnet corpus and use this method instead.
Beyond the additional support required for the increased dependencies I have found the following solution (which requires two changes) to work for me.
From
To
The text was updated successfully, but these errors were encountered: