Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: [E103] #22

Open
Huzmorgoth opened this issue Nov 10, 2019 · 21 comments
Open

ValueError: [E103] #22

Huzmorgoth opened this issue Nov 10, 2019 · 21 comments

Comments

@Huzmorgoth
Copy link

I get the error mentioned below while training, even when I used the same code.

ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

@Abhimanyu100
Copy link

Abhimanyu100 commented Nov 11, 2019

@Huzmorgoth paste this code
`# trim some entity
def trim_entity_spans(data: list) -> list:

invalid_span_tokens = re.compile(r'\s')
cleaned_data = []
for text, annotations in data:
    entities = annotations['entities']
    valid_entities = []
    for start, end, label in entities:
        valid_start = start
        valid_end = end
        while valid_start < len(text) and invalid_span_tokens.match(
                text[valid_start]):
            valid_start += 1
        while valid_end > 1 and invalid_span_tokens.match(
                text[valid_end - 1]):
            valid_end -= 1
        valid_entities.append([valid_start, valid_end, label])
    cleaned_data.append([text, {'entities': valid_entities}])

return cleaned_data`

@Huzmorgoth
Copy link
Author

@Abhimanyu100
Hi, I tried but it's not working, the same issue occurring.


Statring iteration 0
Traceback (most recent call last):

File "", line 1, in
.
.
.
_format_docs_and_golds
gold = GoldParse(doc, **gold)

File "gold.pyx", line 715, in spacy.gold.GoldParse.init

File "gold.pyx", line 925, in spacy.gold.biluo_tags_from_offsets

ValueError: [E103] Trying to set conflicting doc.ents: '(3385, 3391, 'Companies worked at')' and '(3345, 3896, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

@Nisit007
Copy link

i also have this error..

ValueError: [E103] Trying to set conflicting doc.ents: '(370, 392, 'Designation')' and '(370, 391, 'Designation')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

@Abhimanyu100
Copy link

Abhimanyu100 commented Nov 19, 2019

[Edit] Which spacy version you are using? I'm able to resolve this issue.

@Huzmorgoth
Copy link
Author

Python 3

@Abhimanyu100
Copy link

I'm sorry. I was asking for Spacy version.

@Huzmorgoth
Copy link
Author

Oh damn, it's 2.2.2

@Abhimanyu100
Copy link

Use Spacy version 2.1.4 I was able to get results with this library. Let me know if this works for you.

@sayalraza
Copy link

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

@sayalraza sayalraza mentioned this issue Dec 10, 2019
@vverman
Copy link

vverman commented Mar 2, 2020

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

Hey could you post it in your own git and share the file?

@Srijha09
Copy link

I got the same error as well.
ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
It would be very helpful if someone can help out

@JasonLing95
Copy link

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

Hi, could you share the test and train.json. Thank you

@B-Yassine
Copy link

I am encounteering the same problem: ValueError: [E103] Trying to set conflicting doc.ents: '(1155, 1199, 'Email Address')' and '(1143, 1240, 'Links')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Did you guys figure out a way to resolve it?

@aditya-malte
Copy link

@sayalraza Can you share the stated clean dataset

@udara-kw
Copy link

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

@sayalraza Hey, can you please share the clean dataset. Thanks in advance!

@harshgeek4coder
Copy link

try installing this version :

pip install spacy==2.0.18

@siddharth271101
Copy link

try installing this version :

pip install spacy==2.0.18

@harshgeek4coder were you able to solve it?

@gamingflexer
Copy link

gamingflexer commented Feb 6, 2022

v3 gives new error so try for
pip install spacy==2.2.4
(collab pre installed - feb 22)

@Seemz246
Copy link

[E103] Trying to set conflicting doc.ents: '(402, 818, 'Skills')' and '(817, 1118, 'worked at')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
I'm also getting the same error while training the code. Anyone, please help me to run the code also. I'm not that much familiar in machine learning

@Seemz246
Copy link

spaCy version 2.3.5
Python version 3.9.10 using this version

@BillelBenoudjit
Copy link

I have found this code that fixes the overlapping issue.

def clean_entities(training_data):
  clean_data = []
  for text, annotation in training_data:
        
    entities = annotation.get('entities')
    entities_copy = entities.copy()
        
    # append entity only if it is longer than its overlapping entity
    i = 0
    for entity in entities_copy:
      j = 0
      for overlapping_entity in entities_copy:
        # Skip self
        if i != j:
          e_start, e_end, oe_start, oe_end = entity[0], entity[1], overlapping_entity[0], overlapping_entity[1]
          # Delete any entity that overlaps, keep if longer
          if ((e_start >= oe_start and e_start <= oe_end) \
          or (e_end <= oe_end and e_end >= oe_start)) \
          and ((e_end - e_start) <= (oe_end - oe_start)):
            entities.remove(entity)
        j += 1
      i += 1
    clean_data.append((text, {'entities': entities}))
                
  return clean_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests