Skip to content

Tokenizing named entities as a single token #3259

Discussion options

You must be logged in to vote

Named entities are Span objects, so you can iterate over the doc.ents and then merge them into a single token. spaCy also ships with a handy component you can plug into your pipeline that takes care of this automatically:

from spacy.pipeline import merge_entities

nlp = spacy.load("en_core_web_sm")  # or any other model
nlp.add_pipe(merge_entities)

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer feat / tokenizer Feature: Tokenizer
2 participants
Converted from issue

This discussion was converted from issue #3259 on December 10, 2020 13:42.