-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecating the old Stanford Parser #1843
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it worth creating a drop-in replacement for the Stanford parser, mostly because the API is incompatible: CoreNLPParser requires CoreNLPServer.
Deprecation warnings are useful!
nltk/parse/stanford.py
Outdated
@@ -281,6 +282,15 @@ class StanfordParser(GenericStanfordParser): | |||
|
|||
_OUTPUT_FORMAT = 'penn' | |||
|
|||
def __init__(): | |||
warnings.simplefilter('always', DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether warnings.simplefilter
should be called here. It should be up to the user to decide what warnings are shown.
nltk/parse/stanford.py
Outdated
"be deprecated\n" | ||
"Please use \033[91mnltk.parse.corenlp.CoreNLPParser\033[0m instead.'"), | ||
DeprecationWarning, stacklevel=2) | ||
warnings.simplefilter('ignore', DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, what if the user had some custom filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is str("...")
really needed? would "..."
be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is an unmatched '
.
nltk/parse/stanford.py
Outdated
@@ -23,6 +23,7 @@ | |||
from nltk.parse.api import ParserI | |||
from nltk.parse.dependencygraph import DependencyGraph | |||
from nltk.tree import Tree | |||
from nltk.parse.corenlp import CoreNLPParser, CoreNLPDependencyParser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this:
from nltk.parse import corenlp
and later refer to CoreNLPParser
and CoreNLPDependencyParser
as corenlp.CoreNLPParser
and corenlp.CoreNLPDependencyParser
.
nltk/parse/stanford.py
Outdated
@@ -400,6 +426,76 @@ def _make_tree(self, result): | |||
return DependencyGraph(result, top_relation_label='ROOT') | |||
|
|||
|
|||
class CoreNLPParser(CoreNLPParser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a new import this becomes:
class CoreNLPParser(corenlp.CoreNLPParser):
I suggest not to use CoreNLPParser
as a name, as it's confusing.
@@ -228,6 +228,7 @@ def _execute(self, cmd, input_, verbose=False): | |||
|
|||
return stdout | |||
|
|||
|
|||
class StanfordParser(GenericStanfordParser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why not to rewrite the implementation of StanfordParser
so that it uses CoreNLP parser
, but then one needs to be careful with the CoreNLPServer.
StanfordTagger and StanfordTokenizer have been deprecated in the same way. If you think it really not worth to replace the |
Right, these are a tagger and a tokenizer, not a parser. I would deprecate the Stanford parser and suggest to use corenlp.CorreNLPParser. @alvations what do you think? |
Deprecating the But we have to ensure that it behaves as how a
Maybe something like this:
|
Another way is to make CoreNLPParser to implement all those interfaces and get rid of CoreNLPTokenizer and CoreNLP*Tagger. The reasoning being is that we provide an interface to a tool which is a tagger, parser and tokenizer. |
What should i do with different return types? The original Also |
@artiemq a list and a generator are both iterables, so I would not pay attention to the difference. Personally, I prefer a generator, because then it up to the user to decide what container to use (a list, a set or none and just iterate over the generator). I would not add
It seems that the difference between |
nltk/parse/corenlp.py
Outdated
import requests | ||
|
||
self.url = url | ||
self.encoding = encoding | ||
|
||
assert tagtype in ['pos', 'ner', None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw an exception here saying that tagtype
must be either 'pos'
, 'ner'
or None
.
nltk/parse/corenlp.py
Outdated
tagged_data = self.api_call(sentence, properties=default_properties) | ||
assert len(tagged_data['sentences']) == 1 | ||
# Taggers only need to return 1-best sentence. | ||
yield [(token['word'], token[self.tagtype]) for token in tagged_data['sentences'][0]['tokens']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But they can return several, according to the API (I guess so). I would not limit the functionality here and if CoreNLP returns several parses, pass all of them.
nltk/parse/stanford.py
Outdated
@@ -23,9 +23,11 @@ | |||
from nltk.parse.api import ParserI | |||
from nltk.parse.dependencygraph import DependencyGraph | |||
from nltk.tree import Tree | |||
from nltk.parse import corenlp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks redundant.
nltk/corpus/reader/plaintext.py
Outdated
@@ -14,7 +14,7 @@ | |||
from six import string_types | |||
import codecs | |||
|
|||
import nltk.data | |||
from nltk.data import LazyLoader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to leave this code as it was.
nltk/downloader.py
Outdated
@@ -181,6 +181,7 @@ | |||
from six.moves.urllib.error import HTTPError, URLError | |||
|
|||
import nltk | |||
from nltk import data, internals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, it might not worth to play with imports
I've updated |
Looks good, thanks! |
Lets get this merged and prepare for #1892 =) |
Thanks @artiemq, @dimazest, @alvations |
Hello, I was trying to move to CoreNLPPOSTagger and CoreNLPNERTagger from StanfordNERTagger with NLTK v3.2.5. The CoreNLPPOSTagger worked as expected but CoreNLPNERTagger threw an HTTPError: 500 Server Error. I tried both CoreNLP v3.9.1 (the latest version) and v3.8.0, and they both threw the same error. I posted more details at #2010 Does anyone have some idea on this problem? Or I should be safe if I stay with StanfordNERTagger using v3.2.5? Thanks. |
Worked on #1839 issue
Also refactored CoreNLPTokenizer and CoreNLPTagger tests