Running NER on tokenized data only: KeyError: "Parameter 'E' for model 'hashembed' has not been allocated yet." #10507

Lenala39 · 2022-03-17T12:06:45Z

Lenala39
Mar 17, 2022

Hello,

I am trying to only run Named Entity Recognition and my dataset already contains tokenized text which I need to use since I need this exact tokenization for the indices. I do not need to train the pipeline on my dataset, I only need to apply it for the results.

I was trying to apply only the NER-pipeline on a custom Doc-object like this:

from spacy.tokens import Doc 
import spacy
import scispacy

nlp = spacy.load("en_ner_bc5cdr_md", disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer", "textcat", "ner"], 
                                exclude=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer", "textcat", "ner"])

test_list = ["Hello", "I", "am", "Le", "##na", "."]

doc = Doc(nlp.vocab, words=test_list) # init doc with tokens
entity_recognizer = nlp.add_pipe("ner") # add to pipe so I get the NER component to call
entity_recognizer(doc)

But then I recieved the following error:


"Parameter 'E' for model 'hashembed' has not been allocated yet."
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 216, in get_param
    raise KeyError(
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/hashembed.py]()", line 61, in forward
    vectors = cast(Floats2d, model.get_param("E"))
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/chain.py]()", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/concatenate.py]()", line 44, in <listcomp>
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/concatenate.py]()", line 44, in forward
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/with_array.py]()", line 90, in _ragged_forward
    Y, get_dX = layer(Xr.dataXd, is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/with_array.py]()", line 30, in forward
    return _ragged_forward(
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/chain.py]()", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/chain.py]()", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/chain.py]()", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/ml/parser_model.pyx]()", line 216, in spacy.ml.parser_model.ParserStepModel.__init__
    self.tokvecs, self.bp_tokvecs = layers[0](docs, is_train=train)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/ml/tb_framework.py]()", line 31, in forward
    step_model = ParserStepModel(
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py]()", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/pipeline/transition_parser.pyx]()", line 224, in spacy.pipeline.transition_parser.Parser.greedy_parse
    model = self.model.predict(docs)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/pipeline/transition_parser.pyx]()", line 209, in spacy.pipeline.transition_parser.Parser.predict
    return self.greedy_parse(docs, drop=0.0)
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/pipeline/trainable_pipe.pyx]()", line 52, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__
    scores = self.predict([doc])
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/util.py]()", line 1486, in raise_error
    raise e
  File "[/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/pipeline/trainable_pipe.pyx]()", line 56, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__
    error_handler(self.name, self, [doc], e)
  File "[/home/lena/TemplateFilling/Attacking.py]()", line 59, in <module>
    doc = entity_recognizer(doc)
  File "[/usr/lib/python3.8/runpy.py]()", line 87, in _run_code
    exec(code, run_globals)
  File "[/usr/lib/python3.8/runpy.py]()", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "[/usr/lib/python3.8/runpy.py]()", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "[/usr/lib/python3.8/runpy.py]()", line 87, in _run_code
    exec(code, run_globals)
  File "[/usr/lib/python3.8/runpy.py]()", line 194, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

After researching this "Parameter 'E' for model 'hashembed' has not been allocated yet.", I realized that I have to run tok2vec. So I added that component in similarly to how I added the NER-component:

    from spacy.tokens import Doc 
    import spacy
    import scispacy
    test_list = ["Hello", "I", "am", "Le", "##na", "."]
    doc = Doc(nlp.vocab, words=test_list) # init doc with tokens
    entity_recognizer = nlp.add_pipe("ner") # add to pipe so I get the NER component to call
    tokens_to_vec = nlp.add_pipe("tok2vec")  # add to pipe so I get the tok2vec component
    
    tokens_to_vec(doc) # <- already fails here
    entity_recognizer(doc)

But I recieve this error as well already when trying to apply the tok2vec:


Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "spacy/pipeline/trainable_pipe.pyx", line 56, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/util.py", line 1486, in raise_error
    raise e
  File "spacy/pipeline/trainable_pipe.pyx", line 52, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy/pipeline/tok2vec.py", line 125, in predict
    tokvecs = self.model.predict(docs)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/chain.py", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/chain.py", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/with_array.py", line 30, in forward
    return _ragged_forward(
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/with_array.py", line 90, in _ragged_forward
    Y, get_dX = layer(Xr.dataXd, is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/concatenate.py", line 44, in forward
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/chain.py", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/layers/hashembed.py", line 61, in forward
    vectors = cast(Floats2d, model.get_param("E"))
  File "/home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/thinc/model.py", line 216, in get_param
    raise KeyError(
KeyError: "Parameter 'E' for model 'hashembed' has not been allocated yet."

What am I missing here? I can find no other infos about this error. Thank you in advance!


============================== Info about spaCy ==============================

spaCy version    3.0.8                         
Location         /home/lena/TemplateFilling/tf_venv/lib/python3.8/site-packages/spacy
Platform         Linux-5.13.0-35-generic-x86_64-with-glibc2.29
Python version   3.8.10                        
Pipelines        en_ner_bc5cdr_md (0.4.0)

Answered by adrianeboyd

Mar 17, 2022

This is kind of the same issue as #10508, just with slightly different (admittedly confusing) error messages.

I'm not sure exactly what you're trying to do, but you probably want to keep the ner component from en_ner_bc5cdr_md rather than excluding it and trying to replace it with new uninitialized/untrained one.

The Doc construction is fine, but it's unlikely that en_ner_bc5cdr_md has been trained on BPE/wordpiece-y tokens like ##na so you might not see good results. Is there a particular reason that you're using this tokenization?

View full answer

adrianeboyd · 2022-03-17T12:50:05Z

adrianeboyd
Mar 17, 2022

This is kind of the same issue as #10508, just with slightly different (admittedly confusing) error messages.

I'm not sure exactly what you're trying to do, but you probably want to keep the ner component from en_ner_bc5cdr_md rather than excluding it and trying to replace it with new uninitialized/untrained one.

The Doc construction is fine, but it's unlikely that en_ner_bc5cdr_md has been trained on BPE/wordpiece-y tokens like ##na so you might not see good results. Is there a particular reason that you're using this tokenization?

3 replies

Lenala39 Mar 17, 2022
Author

Yes, I would like to keep the trained NER model and not have to re-train something. I "only" need to input the tokens into the model to get labels for "DISEASE" and "CHEMICAL" on the relevant tokens, so I do not need any other PoS-information, lemmas etc.

I need to use that tokenization because my dataset for NER labeling relies on the entity indices for the ground truth so I have to keep that exact tokenization. It was computed using the BERT-Tokenizer if I am not mistaken.

adrianeboyd Mar 17, 2022

Can you give an example of the input and expected output for the task?

en_ner_bc5cdr_md was trained using the provided tokenizer in the pipeline you're probably going to get nonsense output on BPE tokens.

I think you will be better off running en_ner_bc5cdr_md on the original plain text (letting the pipeline do its own tokenization) and then trying to align the output with the BPE tokens (e.g., with spacy-alignments).

Lenala39 Mar 18, 2022
Author

Since my pipeline is a little more complicated, I just went with using spacy-alignments and mapping my tokenization back to spacy's. But thanks for trying to help and replying so quickly!

adsk2050 · 2024-02-09T06:19:17Z

adsk2050
Feb 9, 2024

Hello

This error is popping up when I am trying to train a 'ner' component, while keeping the tok2vec component frozen. Can I use spacy-alignment? If yes, how to go about it?

Here is my base_config.cfg:

[paths]
train = "train.spacy"
dev = "dev.spacy"
vectors = "en_core_web_lg"

[system]
gpu_allocator = null

[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000

[components]

[components.ner]
factory = "ner"

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,1000,2500,2500]
include_static_vectors = true

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 256
depth = 8
window_size = 1
maxout_pieces = 3

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
dropout = 0.5
frozen_components = ["tok2vec"]
annotating_components = ["tok2vec"]
optimizer = {"@optimizers":"Adam.v1"}

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

[initialize]
vectors = ${paths.vectors}

Here is the error:

KeyError: "Parameter 'E' for model 'hashembed' has not been allocated yet."

1 reply

svlandeg Feb 12, 2024
Maintainer

Hi @adsk2050,

You've created a config for a pipeline that will create a new tok2vec model from scratch:

[components.tok2vec]
factory = "tok2vec"

Is that the intention? Because this tok2vec layer will be meaningless - it's untrained at that point. Perhaps you intended to source it from en_core_web_lg instead? In that case, you can change the whole components.tok2vec block to

[components.tok2vec]
source = "en_core_web_lg"

(also remove all [components.tok2vec/model] subsections)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running NER on tokenized data only: KeyError: "Parameter 'E' for model 'hashembed' has not been allocated yet." #10507

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Running NER on tokenized data only: KeyError: "Parameter 'E' for model 'hashembed' has not been allocated yet." #10507

Lenala39 Mar 17, 2022

Replies: 2 comments · 4 replies

adrianeboyd Mar 17, 2022

Lenala39 Mar 17, 2022 Author

adrianeboyd Mar 17, 2022

Lenala39 Mar 18, 2022 Author

adsk2050 Feb 9, 2024

svlandeg Feb 12, 2024 Maintainer

Lenala39
Mar 17, 2022

Replies: 2 comments 4 replies

adrianeboyd
Mar 17, 2022

Lenala39 Mar 17, 2022
Author

Lenala39 Mar 18, 2022
Author

adsk2050
Feb 9, 2024

svlandeg Feb 12, 2024
Maintainer