Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON generated by NER Annotator doesn't seem to work with Spacy convertor #44

Open
tecoholic opened this issue Jun 10, 2022 · 6 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@tecoholic
Copy link
Owner

See comment at #43 (reply in thread)

@tecoholic tecoholic added the bug Something isn't working label Jun 10, 2022
@tecoholic tecoholic self-assigned this Jun 10, 2022
@tecoholic tecoholic added help wanted Extra attention is needed good first issue Good for newcomers labels Jun 10, 2022
@MikhailKlemin
Copy link

MikhailKlemin commented Jun 17, 2022

Hello there!

A simple function to convert generated by ner-annotator JSON directly to docbin would be this one:

from spacy.tokens import DocBin
import spacy
import json
from tqdm import tqdm
import random

nlp = spacy.blank("en")


def load_data(file):
    with open(file, "r", encoding="utf-8") as f:
        data = json.load(f)
    return (data["annotations"])


train_data = load_data("./data/annotation_1.json")
valid_data = load_data("./data/annotation_3.json")


def create_training(TRAIN_DATA):
    db = DocBin()
    for text, annot in tqdm(TRAIN_DATA):
        doc = nlp.make_doc(text)
        ents = []
        for start, end, label in annot["entities"]:
            span = doc.char_span(start, end, label=label,
                                 alignment_mode="contract")
            if span is None:
                print("Skipping entity")
            else:
                ents.append(span)
        doc.ents = ents
        db.add(doc)
    return (db)


train_data = create_training(train_data)
train_data.to_disk("./data/train2.spacy")
valid_data = create_training(valid_data)
valid_data.to_disk("./data/valid2.spacy")

PS Good job for the app, I love it.

@tecoholic
Copy link
Owner Author

@MikhailKlemin Thank you for coming up with the solution.

@ankitladva11
Copy link

I am facing issues in saving it to disk .spacy file, what to do ?
Thanks in advance

@ankitladva11
Copy link

I am facing issues in saving it to disk .spacy file, what to do ? Thanks in advance

Resolved!!

@alvi-khan
Copy link
Collaborator

Hey @ankitladva11! Glad to know your problem was resolved. When you have the time, could you please leave a comment describing your issue and how you managed to resolve it? It would be useful to future users who might face the same issue. TIA!

@DaanDeSmedt
Copy link

DaanDeSmedt commented Feb 28, 2024

@dreji18 also has a nicely documented approach to getting Spacy to work with the NER Annotator export.

Annotate your data for NER Training 📣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants