Skip to content
This repository has been archived by the owner on Oct 7, 2023. It is now read-only.

Does pynlp keep the original tag type "O" which is the non-entity part? #13

Open
hexingren opened this issue Apr 27, 2018 · 3 comments
Open

Comments

@hexingren
Copy link

Hello,

Does pynlp keep the original tag type "O" which is the non-entity part?

For example,
sentence = "Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife"

Expecting result:
[('Nora Jani', 'PERSON'), ('a single person', 'O'), ('Matt Jani', 'PERSON'), ('and', 'O'), ('Susan Jani', 'PERSON'), ('husband and wife', 'O')]

Thanks.

@sina-al
Copy link
Owner

sina-al commented Apr 27, 2018

Yes, try this:

from pynlp import StanfordCoreNLP

nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner')

document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")

for sentence in document:
    for token in sentence:
        print(token, token.ner)

This will give you token level named entity recognition.

If you want entities that span multiple tokens, use entitymentions

nlp = StanfordCoreNLP(annotators='entitymentions')

for entity in document.entities:
    print(entity)

@sina-al
Copy link
Owner

sina-al commented Apr 27, 2018

I will try to write up some docs soon.

@hexingren
Copy link
Author

For the first block of code, it will fall back to #12 if I add 'tokenize, ssplit, pos'. The working code for now is

from pynlp import StanfordCoreNLP

nlp = StanfordCoreNLP(annotators='ner', options = {"ner.useSUTime": False})
# The code below throws CoreNLPServerError: Status code: [500] 
# nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner', options = {"ner.useSUTime": False})

document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")

for sentence in document:
    for token in sentence:
        print(token, token.ner)

Should be a problem on the CoreNLP server side. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants