Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DependencyGraph or Stanford Parser API issues with sentences with "/" #1510

Closed
alvations opened this issue Nov 18, 2016 · 20 comments
Closed

Comments

@alvations
Copy link
Contributor

alvations commented Nov 18, 2016

A user has reported that this sentence throws and AssertionError when using Stanford's DependencyParser API in NLTK for this sentence:

for all of its insights into the dream world of teen life , and its electronic expression through cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 1/2-hour running time .

Code:

>>> from nltk.parse.stanford import StanfordDependencyParser                                                                                       >>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")                                        
>>> sent = 'for all of its insights into the dream world of teen life , and its electronic expression through cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 1/2-hour running time . '
>>> dep_parser.raw_parse(sent)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 132, in raw_parse
    return next(self.raw_parse_sents([sentence], verbose))
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 150, in raw_parse_sents
    return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 91, in _parse_trees_output
    res.append(iter([self._make_tree('\n'.join(cur_lines))]))
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 339, in _make_tree
    return DependencyGraph(result, top_relation_label='root')
  File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 84, in __init__
    top_relation_label=top_relation_label,
  File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 328, in _parse
    assert cell_number == len(cells)
AssertionError

Possibly, it might be how DependencyGraph is reading the output or that the Stanford output is inconsistent.

More details on the setup for NLTK + Stanford tools is on https://gist.github.com/alvations/e1df0ba227e542955a8a#stanford-parser

@hoavt-54
Copy link

hoavt-54 commented May 4, 2017

Hi @alvations Any updates on this?
Thanks

@alvations
Copy link
Contributor Author

@hoavt-54 I think there's a quick way to check whether it's Stanford side or the DependencyGraph code causing the problem using the new interface from #1249. I'll be a little busy today but perhaps someone else can check it out and get back on this.

@dimazest
Copy link
Contributor

I can have a look, somehow I've missed this issue.

@dimazest
Copy link
Contributor

dimazest commented Jun 3, 2017

@tesslocl what's your sentence? Did you try to use CoreNLP (nltk/parse/corenlp.py) instead?

@dimazest
Copy link
Contributor

dimazest commented Jun 3, 2017

You need to start a cornlp server, try:

with CoreNLPServer(port=9000) as server:
    parser = CoreNLPParser(url=server.url)
    parser.parse(...)

I'm sorry for the missing documentation, and for a shirt reply, as I'm typing on my phone.

@dimazest
Copy link
Contributor

dimazest commented Jun 5, 2017

you can try other port: CoreNLPServer(port=9001), for example or just CoreNLPServer() then a free port should be chosen.

@dimazest
Copy link
Contributor

dimazest commented Jun 7, 2017

Do you have corenlp .jars? You need to have a corenlp server running locally.

Can you run this example #1249 (review)

@dimazest
Copy link
Contributor

dimazest commented Jun 7, 2017

Once you've started the server, can you access http://localhost:9000 in your browser?

You can also start the server by yourself, refer to https://stanfordnlp.github.io/CoreNLP/corenlp-server.html

One it's running, and you can access it via the browser, you should be able to use the parser:

parser = CoreNLPParser(url='http://localhost:9000')
# and so on

@dimazest
Copy link
Contributor

dimazest commented Jun 9, 2017

Are you able to start a corenlp server from a terminal (not from python), check https://stanfordnlp.github.io/CoreNLP/corenlp-server.html for more details?

# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

@dimazest
Copy link
Contributor

Ok, there are two steps involved:

  1. Start a CoreNLP Java process. There are two ways, I suggest you to start manually, that is using the java -Xmx4g -cp ... command. Did you succeed in it? You should be able to access the server via a browser by visiting http://localhost:9000. You should see in the console output, what port is being used.
  2. Once the server is running, you can create a CoreNLP python client parser = CoreNLPParser(url='http://localhost:9000'). As you've started the CoreNLP Java server by yourself, you don't need to start it within the python session (don't run server = CoreNLPServer())

The error messages you post suggest that the CoreNLP Java server is not running.

@dimazest
Copy link
Contributor

Did you try it with "*" as a classpath: java -mx4g -cp "*" ...?

@csn6666
Copy link

csn6666 commented Jun 19, 2017

Hi there, seems that I also encountered with this problem. My sentence is:
'Maybe, a 2 21/2 foot cord?', u'And its of a cheaper quality than the part of the charger that the micro usb plugs into...'
And I tried to figure out, it seems that the '/' causes this error.

@dimazest
Copy link
Contributor

@caisinong have you tried using the new CoreNLP interface? See my comments above.

@dimazest
Copy link
Contributor

Once you've started a server manually, you don't need to start server in the code.

Keep the server running and instantiate the parser:

parser = CoreNLPParser(url='http://localhost:9000')

@alvations
Copy link
Contributor Author

I have similar experience. Starting the Stanford CorenNLP server in the code is messy and should only be used for testing purposes. Maybe we should somehow not expose that to the user.

@dimazest
Copy link
Contributor

I'm glad that things are working. Indeed, the server should be started outside of Python code.

@alvations alvations added this to the 3.2.5 milestone Oct 4, 2017
@alvations
Copy link
Contributor Author

Patched and resolved by new CoreNLP API =)

@kavin26
Copy link

kavin26 commented Dec 28, 2017

@dimazest Hi... if the text contains \ or / solution for Assertion Error is to only use Core NLP? i'm using stanford-parser-full-2017-06-09
Sentence used for parsing was Iraqi security forces drove Islamic State fighters from the centre of a town just south of the militants\' main stronghold of Mosul on Saturday and reached within a few km (miles) of an airport on the edge of the city, a senior commander said.

@alvations
Copy link
Contributor Author

@kavin26 Yes, please use the nltk.parse.corenlp.CoreNLPParser.

@kavin26
Copy link

kavin26 commented Dec 28, 2017

@alvations thank you so much 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants