Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible ORG misidentification #13437

Closed
grabastart opened this issue Apr 13, 2024 · 1 comment
Closed

Possible ORG misidentification #13437

grabastart opened this issue Apr 13, 2024 · 1 comment

Comments

@grabastart
Copy link

How to reproduce the behaviour

import spacy
print(spacy.version)

model='en_core_web_sm'
nlp = spacy.load(model)

query = "Identify top 4 open source small language model that can run on a personal computer."

Use spaCy for Named Entity Recognition

doc = nlp(query)
for ent in doc.ents:
# original code
if ent.label_ in ['PERSON', 'ORG']:
# display PERSON or ORG found
print(ent.label_,ent.text)

query = "Identify top 4 open source small language model that can run on a personal computer." FOUND ORG

query = "Identify top 4 open source small language model" FOUND ORG

comment: they should not be identified as "ORG"

query = "Identify small language model" # NOT found

comment: expected behavior

query = "Identify top 4 small language model" # FOUND ORG Identify

query = "identify top 4 small language model" # FOUND ORG Identify

query = "list top 4 small language models" # FOUND ORG List

query = "Can you list the four best small language models?" FOUND ORG

comment: they should not be identified as "ORG"

query = "identify top 4 software" NOT FOUND

query = "What is open source?" NOT FOUND

comment: expected behavior but the English expressions are not idiomatic

query = "Identify top 4 open source tool" FOUND ORG

comment: they should not be identified as "ORG"

query = "What is open source?" NOT FOUND

query = "Identify top 4 open source" FOUND ORG

comment: it should not be identified as "ORG"

query = "identify top 4 software tool" # NOT FOUND

query = "Identify top 4 software tool" # FOUND ORG

comment: the second query should not be identified as "ORG"

query = "Who is Matthew Honnibal?" found PERSON

query = "Who is Andrew Ng?" found PERSON

comment: expected behavior

query = "Identify top 4 open source small language model that can run on a personal computer." # ORG Identify

query = "identify top 4 open source small language model that can run on a personal computer." # NOT FOUND

comment: they both should not be identified as "ORG"

Your Environment

  • Operating System:
    Windows 10
  • Python Version Used:
    3.10.7
  • spaCy Version Used:
    3.7.2
  • Environment Information:
    3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]

Thanks.

@danieldk
Copy link
Contributor

Converting this to a discussion, since it is not a bug in spaCy.

@explosion explosion locked and limited conversation to collaborators Apr 15, 2024
@danieldk danieldk converted this issue into discussion #13438 Apr 15, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants