Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreNLPNERTagger throws HTTPError: 500 Server Error: Internal Server Error for url: ...... #2010

Closed
hexingren opened this issue Apr 25, 2018 · 24 comments

Comments

@hexingren
Copy link

hexingren commented Apr 25, 2018

Hello,

I'm using nltk v3.2.5 and try to use CoreNLPNERTagger with both Stanford CoreNLP v3.9.1 (the latest version) and v3.8.0. However, they both throw an HTTPError: 500 Server Error.

The code is
"""
from nltk.tag.stanford import CoreNLPPOSTagger, CoreNLPNERTagger
CoreNLPPOSTagger(url='http://localhost:9000').tag('What is the airspeed of an unladen swallow ?'.split())
CoreNLPNERTagger(url='http://localhost:9000').tag('Rami Eid is studying at Stony Brook University in NY.'.split())
"""

CoreNLPPOSTagger was able to give the expected result, so I guess I set up the server correctly. The error message for CoreNLPNERTagger is

"""

HTTPError Traceback (most recent call last)
in ()
----> 1 CoreNLPNERTagger(url='http://localhost:9000').tag('Rami Eid is studying at Stony Brook University in NY.'.split())

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tag\stanford.py in tag(self, sentence)
229
230 def tag(self, sentence):
--> 231 return self.tag_sents([sentence])[0]
232
233 def raw_tag_sents(self, sentences):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tag\stanford.py in tag_sents(self, sentences)
225 # Converting list(list(str)) -> list(str)
226 sentences = (' '.join(words) for words in sentences)
--> 227 return list(self.raw_tag_sents(sentences))
228
229

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tag\stanford.py in raw_tag_sents(self, sentences)
242 default_properties['annotators'] += self.tagtype
243 for sentence in sentences:
--> 244 tagged_data = self.api_call(sentence, properties=default_properties)
245 assert len(tagged_data['sentences']) == 1
246 # Taggers only need to return 1-best sentence.

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\parse\corenlp.py in api_call(self, data, properties)
249 )
250
--> 251 response.raise_for_status()
252
253 return response.json()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\models.py in raise_for_status(self)
933
934 if http_error_msg:
--> 935 raise HTTPError(http_error_msg, response=self)
936
937 def close(self):

HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%2Cner%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D
"""
Could anyone point out what happened here? Thanks!

@dimazest
Copy link
Contributor

Hi,

do you see any errors coming from the CoreNLP log?

@hexingren
Copy link
Author

Yes.

CoreNLPPOSTagger worked as expected with no error. The error message when I ran CoreNLPNERTagger is
"""
[pool-1-thread-1] INFO CoreNLP - [/0:0:0:0:0:0:0:1:52437] API call w/annotators tokenize,ssplit,pos,lemma,ner
Rami Eid is studying at Stony Brook University in NY.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-1] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.1 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.5 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.6 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:38)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(TimeExpressionExtractorFactory.java:60)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(TimeExpressionExtractorFactory.java:43)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.(NumberSequenceClassifier.java:86)
at edu.stanford.nlp.ie.NERClassifierCombiner.(NERClassifierCombiner.java:135)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.(NERCombinerAnnotator.java:131)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:68)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$44(StanfordCoreNLP.java:546)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$69(StanfordCoreNLP.java:625)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126)
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:201)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:194)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:181)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:366)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$800(StanfordCoreNLPServer.java:50)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:851)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(Unknown Source)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(Unknown Source)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public edu.stanford.nlp.time.TimeExpressionExtractorImpl(java.lang.String,java.util.Properties) with args [sutime, {}]
at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237)
at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:382)
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:36)
... 27 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233)
... 29 more
Caused by: java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException
at de.jollyday.util.CalendarUtil.(CalendarUtil.java:42)
at de.jollyday.HolidayManager.(HolidayManager.java:66)
at de.jollyday.impl.DefaultHolidayManager.(DefaultHolidayManager.java:46)
at edu.stanford.nlp.time.JollyDayHolidays$MyXMLManager.(JollyDayHolidays.java:148)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at java.base/java.lang.Class.newInstance(Unknown Source)
at de.jollyday.caching.HolidayManagerValueHandler.instantiateManagerImpl(HolidayManagerValueHandler.java:60)
at de.jollyday.caching.HolidayManagerValueHandler.createValue(HolidayManagerValueHandler.java:41)
at de.jollyday.caching.HolidayManagerValueHandler.createValue(HolidayManagerValueHandler.java:13)
at de.jollyday.util.Cache.get(Cache.java:51)
at de.jollyday.HolidayManager.createManager(HolidayManager.java:168)
at de.jollyday.HolidayManager.getInstance(HolidayManager.java:148)
at edu.stanford.nlp.time.JollyDayHolidays.init(JollyDayHolidays.java:57)
at edu.stanford.nlp.time.Options.(Options.java:119)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.init(TimeExpressionExtractorImpl.java:44)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.(TimeExpressionExtractorImpl.java:39)
... 34 more
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.JAXBException
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source)
at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
... 53 more
"""

Thanks.

@dimazest
Copy link
Contributor

edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException:
  Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl

This looks like the key error on the CoreNLP side.

Did you try to tag the sentence via the web interface on http://localhost:9000

@hexingren
Copy link
Author

Hi Dmitrijs,

Thanks for pointing this out. I guess it's on the CoreNLP side. I tried several texts with person names and none of them worked in the live demo at the point. But I remember the demo site worked last week.

Moving forward, if NLTK only provides wrappers for CoreNLP then the users have to worry about the server. Do you think it would be a good idea if we can keep StanfordNERTagger or something similar in the new version? Thank you.

@alvations
Copy link
Contributor

alvations commented Apr 26, 2018

Actually, we should just deprecate the Stanford APIs in NLTK and only wrap around https://github.com/stanfordnlp/python-stanford-corenlp

But that'll require some work to clean out, wrap, merge the API's with NLTK's objects and test. Anyone up for a challenge?

@alvations
Copy link
Contributor

alvations commented Aug 23, 2018

@hexingren Please try the following with NLTK v3.3.

Please use the new CoreNLPParser interface.

First update your NLTK:

pip3 install -U nltk

Then still in terminal:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9000 -port 9000 -timeout 15000 &

python3

Finally, start Python:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> list(parser.parse(['house', ')', 'is', 'in', 'York', 'Avenue']))
[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['house']), Tree('-RRB-', ['-RRB-'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('PP', [Tree('IN', ['in']), Tree('NP', [Tree('NNP', ['York']), Tree('NNP', ['Avenue'])])])])])])]

>>> tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split()
>>> tagger.tag(tokens)
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'STATE_OR_PROVINCE')]

Are you still getting the error with the above?

@alvations
Copy link
Contributor

Closing the issue as resolved for now =)
Please open if there's further issues.

@Bisht9887
Copy link

There is some similar error occuring

@alvations
Copy link
Contributor

@Bisht9887 would you be able to share the dataset and we'll test what happened? If not, could you post the full stacktrace of the error as well as the output on the console for the Stanford CoreNLP server?

@Bisht9887
Copy link


HTTPError Traceback (most recent call last)
in ()
22 print(m)
23
---> 24 name_extracter()

in name_extracter()
18 name_details=match[1]
19 tokens = name_details.split()
---> 20 result=tagger.tag(tokens)
21 for m in result:
22 print(m)

~\Anaconda3\lib\site-packages\nltk\parse\corenlp.py in tag(self, sentence)
380 ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
381 """
--> 382 return self.tag_sents([sentence])[0]
383
384 def raw_tag_sents(self, sentences):

~\Anaconda3\lib\site-packages\nltk\parse\corenlp.py in tag_sents(self, sentences)
359 # Converting list(list(str)) -> list(str)
360 sentences = (' '.join(words) for words in sentences)
--> 361 return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
362
363 def tag(self, sentence):

~\Anaconda3\lib\site-packages\nltk\parse\corenlp.py in (.0)
359 # Converting list(list(str)) -> list(str)
360 sentences = (' '.join(words) for words in sentences)
--> 361 return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
362
363 def tag(self, sentence):

~\Anaconda3\lib\site-packages\nltk\parse\corenlp.py in raw_tag_sents(self, sentences)
399 default_properties['annotators'] += self.tagtype
400 for sentence in sentences:
--> 401 tagged_data = self.api_call(sentence, properties=default_properties)
402 yield [[(token['word'], token[self.tagtype]) for token in tagged_sentence['tokens']]
403 for tagged_sentence in tagged_data['sentences']]

~\Anaconda3\lib\site-packages\nltk\parse\corenlp.py in api_call(self, data, properties)
255 )
256
--> 257 response.raise_for_status()
258
259 return response.json()

~\Anaconda3\lib\site-packages\requests\models.py in raise_for_status(self)
933
934 if http_error_msg:
--> 935 raise HTTPError(http_error_msg, response=self)
936
937 def close(self):

HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%2Cner%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D

@Bisht9887
Copy link

Bisht9887 commented Aug 27, 2018

The data is somewhat this kind of. So I have like 400 text files contaning data similar to as shown below . I am parsing every text file and every line and I am passing the text after 'patient name:' to NER.

patient name: Johny, Rick Performed: Due: 21Mar2018; Last Updated By: Morgan;
patient name: Wes Conte.
patient name: Comfort, John;
patient name: Oswald, Andy Performed: Due: 12Mar2014; Last Updated By: Russell, White;
patient name: Douglass, David;Performed: Due: 23May2015; Last Updated By: Potter, Alisa;
patient name: Hall, Ariana
patient name: Beaver, Jayden
patient name: Oswald, Scott;
patient name: Green, Robert;
patient name: Oswald, Scott;
patient name: Hall, Rob
patient name: Brain Burleth, Nov 10 2013 6:55AM CST
patient name: Grace Johnson, May 11 2011 8:54PM CST

@alvations
Copy link
Contributor

@hexingren do you know which is the line that caused the error? Before tagger.tag(tokens), add the line print(tokens).


Due to the nature of the dataset, I hope the sample in the previous is anonymized. Or at least changed into some fictional names.

BTW, if the data is so structured as shown above, there's really no need for an NER ;P

@Bisht9887
Copy link

@alvations : Thank you! The issue has been resolved as there were some empty tokens getting passed to NER, so now I have put a check for them.
Also, the data is not as structured as I have shown above. It is very much unstructured, otherwise, I would have used regular expression or something else :) . I just put a simple clear sample of it here.

@hexingren
Copy link
Author

I didn't further try this wrapper in April but was inspired by this thread. It had something to do with ner.useSUTime in the old version.

@alvations
Copy link
Contributor

@hexingren There isn't any issue when I ran the code from #2010 (comment) through a size-able corpus.

@hexingren Could you do a quick check on your data and see whether you are still having the same problem the 500 Server Error? Thanks in advance!

The issue @Bisht9887 was raising was because of an empty string. In that case, I think the API fails. @dimazest Maybe we should catch empty strings and return empty Tree() or []?

@hexingren
Copy link
Author

@alvations I didn't use additional data. I was trying the example code in NLTK v3.2.5 and it didn't work on my machine. If the example code works in v3.3 now, then that's great! Thanks.

@alvations
Copy link
Contributor

alvations commented Aug 29, 2018

It should work in v3.3. Here's the updated docs https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK =)
Feel free to reopen this if the problem does happen again.

@JohnnyLim
Copy link

  1. It seems that things pass to the parser shoud not be empty.
    https://stackoverflow.com/questions/52031337/stanfords-corenlp-name-entity-recogniser-throwing-error-500-server-error-inter
  2. And if you run the stanford server with -timeout option, notice the parameter should be greater, like -timeout 90000, because it seems that the server may cause some connection error when it parses the tokens for a long time.

@nmakarun
Copy link

nmakarun commented Mar 15, 2019

@alvations @dimazest I am also facing similar issue error requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22annotators%22%3A+%22tokenize%2Cssplit%2Cner%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%2C+%22outputFormat%22%3A+%22json%22%7D.

And I have to agree with @JohnnyLim, what I observe is when I sent in the text as list which had more than 100 list items. It threw the error immediately but when I sent the first 5 list items, it threw the error after printing the result for first 4 list items.

Below is the full error which I got when I run the NER tagger with the API. NLTK == 3.4.

java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:866)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
File "SEC_Entity_Extraction.py", line 27, in
tagged_text = ner_tagger.tag(text.split())
File "/mnt/c/Users/17200391/Desktop/Python/text/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 366, in tag
return self.tag_sents([sentence])[0]
File "/mnt/c/Users/17200391/Desktop/Python/text/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 345, in tag_sents
return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
File "/mnt/c/Users/17200391/Desktop/Python/text/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 345, in
return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
File "/mnt/c/Users/17200391/Desktop/Python/text/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 387, in raw_tag_sents
tagged_data = self.api_call(sentence, properties=default_properties)
File "/mnt/c/Users/17200391/Desktop/Python/text/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 250, in api_call
response.raise_for_status()
File "/mnt/c/Users/17200391/Desktop/Python/text/lib/python3.5/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22annotators%22%3A+%22tokenize%2Cssplit%2Cner%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%2C+%22outputFormat%22%3A+%22json%22%7D

Could you please let know if there is any workaround for this issue because as I am planning to use the NER tagger for much larger amount of text and just trying some POC initially. Any input in this regard is much appreciated.

Adding to this when i when to the GUI for the API I got this error "CoreNLP request timed out. Your document may be too long.".

Thanks,
nmakarun

@dimazest
Copy link
Contributor

I've updated the wiki page https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK/_compare/3d64e56bede5e6d93502360f2fcd286b633cbdb9...f33be8b06094dae21f1437a6cb634f86ad7d83f7

though, it might worth putting this information into NLTK documentation to avoid documentation spread over several source.

@dmarinav
Copy link

I am not sure why you are saying that the issue was resolved. Does not work for me. I used this link: https://stackoverflow.com/questions/52031337/stanfords-corenlp-name-entity-recogniser-throwing-error-500-server-error-inter

and , unfortunately, it is not helpful at all. I still get the error.

It works for any other tagging operations (like pos tagging) and it works for everything else. I also don't think it has anything to do with text as the ner tagging does not work at all for any text and sentences. I am sure I correctly followed the instructions to load the server: java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \

-preload tokenize,ssplit,pos,lemma,ner,parse,depparse
-status_port 9000 -port 9000 -timeout 15000 &

and had no issues with loading it.

Here is the code that I used:

tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
text = 'The hotel is on a prime stretch of Piccadilly , near the heart of Mayfair , set right between Hyde and Green Parks and a few blocks from the Royal Academy and the Green Park Underground station .'

tokens = text.split()
if tokens:
result=tagger.tag(tokens)
for m in result:
print(m)

Here is what I got:


HTTPError Traceback (most recent call last)
in
6 tokens = name_details.split()
7 if tokens:
----> 8 result=tagger.tag(tokens)
9 for m in result:
10 print(m)

/Applications/anaconda3/lib/python3.7/site-packages/nltk/parse/corenlp.py in tag(self, sentence)
366 ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
367 """
--> 368 return self.tag_sents([sentence])[0]
369
370 def raw_tag_sents(self, sentences):

/Applications/anaconda3/lib/python3.7/site-packages/nltk/parse/corenlp.py in tag_sents(self, sentences)
345 # Converting list(list(str)) -> list(str)
346 sentences = (' '.join(words) for words in sentences)
--> 347 return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
348
349 def tag(self, sentence):

/Applications/anaconda3/lib/python3.7/site-packages/nltk/parse/corenlp.py in (.0)
345 # Converting list(list(str)) -> list(str)
346 sentences = (' '.join(words) for words in sentences)
--> 347 return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
348
349 def tag(self, sentence):

/Applications/anaconda3/lib/python3.7/site-packages/nltk/parse/corenlp.py in raw_tag_sents(self, sentences)
387 default_properties['annotators'] += self.tagtype
388 for sentence in sentences:
--> 389 tagged_data = self.api_call(sentence, properties=default_properties)
390 yield [
391 [

/Applications/anaconda3/lib/python3.7/site-packages/nltk/parse/corenlp.py in api_call(self, data, properties, timeout)
250 )
251
--> 252 response.raise_for_status()
253
254 return response.json()

/Applications/anaconda3/lib/python3.7/site-packages/requests/models.py in raise_for_status(self)
938
939 if http_error_msg:
--> 940 raise HTTPError(http_error_msg, response=self)
941
942 def close(self):

HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%2Cner%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D

@ehsong
Copy link

ehsong commented Sep 29, 2022

Hi everyone, I am getting the same error after 155 rows - the first 155 rows works fine but after 155 rows I get the error

raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9001/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%2Cner%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D

This is quite strange, because I did not have this error when using parser.tokenize, the error only occurs when using tagtype='ner', I also added timeout -90000 and the if clause so that the input is not empty. Could we reopen this issue? @alvations

@tomaarsen
Copy link
Member

tomaarsen commented Sep 30, 2022

Hello @ehsong,

Consider adding -timeout 90000 instead of timeout -90000.

if you are getting 500 Server Error on the Python side, that is because something broke on the CoreNLP server side. There's not much we can do about that, other than help debug the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants