Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating the old Stanford Parser #1839

Closed
alvations opened this issue Sep 26, 2017 · 4 comments
Closed

Deprecating the old Stanford Parser #1839

alvations opened this issue Sep 26, 2017 · 4 comments

Comments

@alvations
Copy link
Contributor

alvations commented Sep 26, 2017

We have deprecated the StanfordTokenizer/StanfordSegmenter, StanfordPOSTagger and StanfordNERTagger.

It would be good to also deprecate the old StanfordParser, StanfordDependencyParser and StanfordNeuralDependencyParser by

  1. Adding the appropriate warnings to the old interface

2a. Wrap the duck-types for CoreNLPParser that emulates the functions of the old interface

2b. Write up documentations of how to use the CoreNLPParser to use dependency and neural dependency parsing

  1. Write tests for the new CoreNLP parser interfaces

Both (2a) and (2b) methods should only affect the properties argument of the api_call

The current interface for CoreNLPParser:

>>> from nltk.parse.corenlp import CoreNLPParser
>>> sent = 'The quick brown fox jumps over the lazy dog.'
>>> next(parser.raw_parse(sent)).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                         ROOT
                          |
                          S
           _______________|__________________________
          |                         VP               |
          |                _________|___             |
          |               |             PP           |
          |               |     ________|___         |
          NP              |    |            NP       |
      ____|__________     |    |     _______|____    |
     DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
     |    |     |    |    |    |    |       |    |   |
    The quick brown fox jumps over the     lazy dog  .

The desired interface might look like this:

# Using Duck-types
>>> from nltk.parse.stanford import CoreNLPParser
>>> depparser = CoreNLPDependencyParser('http://localhost:9000')
>>> depparser.parse(sent)
>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000')
>>> ndepparser.parse(sent)
# Using arguments to control `properties` for `api_call()` 
>>> from nltk.parse.stanford import CoreNLPParser

>>> depparser = CoreNLPParser('http://localhost:9000', parser_type="dependency")
>>> depparser.parse(sent)

>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000', parser_type="neural_dependency")
>>> ndepparser.parse(sent)

This would make a good class project or good first challenge ;P

@artiemq
Copy link
Contributor

artiemq commented Oct 1, 2017

Hi, i would like to work on this issue, but i didn't get why mock used like this

Now tests don't test anything. Even if tokenize body completely erased, tests will still pass. Maybe we should patch the api_call method and then call tokenize

  corenlp_tokenizer = CoreNLPTokenizer()
  corenlp_tokenizer.api_call = MagicMock(return_value=predefined_return_value)
  corenlp_tokenizer.tokenize(input_string)

@alvations
Copy link
Contributor Author

@artiemq Thank you for the interest in the issue!

Mock was used in the unittest because it was a quick way to document how the python flow of the APIs and how a user should use it but it didn't actually call CoreNLP.

Regarding the unittest, perhaps using unittest.mock isn't the best way to test the CoreNLP functionalities. Please feel free to rewrite/edit it and create a PR =)

@ndvbd
Copy link

ndvbd commented Mar 22, 2018

I can see info here on how to connect to the POStagger 'server' at port 9000, but I can't find information on how to run the Stanford postagger server to listen on port 9000... Anyone knows?

@Demetrio92
Copy link

So is this decided or not?
I am now trying to run nltk.tag.StanfordNERTagger(). There is a small issue with it, that I wanted to fix. Shall I do it or not?

I want the parser to run locally without API-calls. Is this possible with CoreNLPParser?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants