Skip to content

What is the difference of norm and lower attributes in token #13283

Discussion options

You must be logged in to vote

Hi!

In many cases, token.norm and token.lower will be the same. Some languages can have tokenizer exceptions though where the norm attribute gets assigned and holds more information beyond just the lowercasing of the token.

Example:

    text = "Albert Einstein wasn't a German-born theoretical physicist."
    nlp = spacy.blank("en")
    doc = nlp(text)
    for token in doc:
        print(token.lower_, token.norm_)

output:

albert albert
einstein einstein
was was
n't not
a a
german german
- -
born born
theoretical theoretical
physicist physicist
. .

Here you see that the token n't is normalized to not.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
2 participants