Spacy-LLM fails with storage not allocated on MPS device #13096

rkatriel · 2023-10-31T14:37:26Z

Hi,

The code example listed below fails with the following error:

RuntimeError: Placeholder storage has not been allocated on MPS device!

I'm running it on a MacBook Air with Apple Silicon (M2, 2022) under macOS Monterey (Version 12.6). Additional details below.

The full traceback is listed below the code.

Note: This is a continuation of Issue #12987 (Unknown function registry: 'llm_backends').

How to reproduce the behaviour

Here is the code, based on the example provided in Matthew Honnibal's blog "Against LLM maximalism" (https://explosion.ai/blog/against-llm-maximalism):

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {
            "@llm_models": "spacy.OpenLLaMA.v1",
            "name": "open_llama_3b"
        },
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

Here is the traceback:

File "/Users/ron.katriel/PycharmProjects/Transformer/test-spacy-llm.py", line 19, in
doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1054, in call
error_handler(name, proc, [doc], e)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 1704, in raise_error
raise e
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1049, in call
doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 156, in call
docs = self._process_docs([doc])
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 210, in _process_docs
responses_iters = tee(self._model(prompts_iters[0]), n_iters)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 55, in call
return [
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 57, in
self._model.generate(input_ids=tii, **self._config_run)[
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
return self.greedy_search(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
outputs = self(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 875, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

Your Environment

Platform: macOS-12.6-arm64-arm-64bit
Python Version: 3.11.4
spaCy Version: 3.6.1

The text was updated successfully, but these errors were encountered:

rmitsch · 2023-11-02T09:43:36Z

Hi @rkatriel, which version of transformers are you using?

rkatriel · 2023-11-02T20:17:14Z

Hi @rmitsch, it is 4.34.1. I upgraded to the latest (4.35.0) but it made no difference.

This issue is mentioned in the following discussion:

https://discuss.pytorch.org/t/torch-embedding-fails-with-runtimeerror-placeholder-storage-has-not-been-allocated-on-mps-device/152124

One suggestion is to check out the following:

https://pytorch.org/docs/master/notes/mps.html

Perhaps incorporating this into Spacy-LLM will resolve the error.

rmitsch · 2023-11-03T10:45:06Z

If you run this snippet from the linked PyTorch docs site, what's your output?

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

rkatriel · 2023-11-03T15:22:01Z

Nothing. This is because the first test fails. Here are the values of the function calls:

>>> torch.backends.mps.is_available()
True
>>> torch.backends.mps.is_built()
True

Both are False when the 'not' is added in the conditions.

rmitsch · 2023-11-06T13:14:48Z

Two approaches here:

Try downgrading transformers and see if that makes a difference.
Try instantiating a Hugging Face model without spacy-llm and move it to your MPS device. If that works, we know it's an issue with spacy-llm (and we just can't reproduce it).

rkatriel · 2023-11-06T17:44:33Z

Thanks for the suggestions. Downgrading transformers doesn't help. I got the same error with transformers-4.28.0, while transformers-4.27.0 causes the following error:

ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

For your second suggestion, see the script below. The Huggingface model is documented here:

https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT

The code works well when the model is not moved to my MPS device, that is, the following line is commented out

#model.to(mps_device)

But when uncommented I get the same error as with Spacy-LLM:

RuntimeError: Placeholder storage has not been allocated on MPS device!

The error occurs inside torch (I can share the traceback if helpful).

Ron

# Check that MPS is available
import torch

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    mps_device = torch.device("mps")

    # Create a Tensor directly on the mps device
    x = torch.ones(5, device=mps_device) # alternative is torch.ones(5, device="mps")

    # Any operation happens on the GPU
    y = x * 2

    from transformers import AutoTokenizer, pipeline,  AutoModelForTokenClassification
    tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
    model = AutoModelForTokenClassification.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

    # Move your model to mps just like any other device
    model.to(mps_device)
    
    nerpipeline = pipeline('ner', model=model, tokenizer=tokenizer)
    text = "Eastern Cooperative Oncology Group performance status ≤ 2"
    result = nerpipeline(text)
    print(result)

rmitsch · 2023-11-07T11:46:30Z

Try passing device to your pipeline() call?

I should have been more specific w.r.t. point 2. I meant to instantiate the model you're using in spacy-llm, OpenLLaMa, directly with transformers. You can copy-paste the snippet here and modify it to use your MPS device. Let me know if that works.

rkatriel · 2023-11-07T18:58:56Z

In reverse order, I couldn't figure out how to modify the code snippet you referenced to use my MPS device. Can you provide more explicit instructions? Specifically, which function accepts the device.

However, I was able to pass the MPS device in my original code as follows and it seemed to work (output produced and no errors):

    from transformers import AutoTokenizer, pipeline,  AutoModelForTokenClassification

    mps_device = torch.device("mps")
    use_mps_device = True

    model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
    start_time = time.time()
    if use_mps_device:
        recognizer = pipeline("ner", model=model, tokenizer=tokenizer, device=mps_device)
    else:
        recognizer = pipeline("ner", model=model, tokenizer=tokenizer)
    print('runtime = %f seconds' % (time.time() - start_time))
    result = recognizer('My name is Sarah and I live in London')
    print(result)

However, the strange thing is that the code is significantly faster when not using the MPS device (0.000060 seconds) than when using it (0.256201 seconds). This doesn't make any sense to me...

rmitsch · 2023-11-08T08:52:29Z

However, the strange thing is that the code is significantly faster when not using the MPS device (0.000060 seconds) than when using it (0.256201 seconds). This doesn't make any sense to me...

Offloading data/model to a device comes with an overhead. It should be faster in the long run (i. e. for many calls).

In reverse order, I couldn't figure out how to modify the code snippet you referenced to use my MPS device. Can you provide more explicit instructions?

(1) Copy-paste the linked snippet. (2) Do model.to(mps_device) (after declaring mps_device = torch.device("mps") before) after the model was initiated.

rkatriel · 2023-11-08T16:52:55Z

That doesn't work. I get the same error

RuntimeError: Placeholder storage has not been allocated on MPS device!

Here is the code I'm running

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto'
)

# Move your model to mps just like any other device
mps_device = torch.device("mps")
model.to(mps_device)

prompt = 'Q: What is the largest animal?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))

rmitsch · 2023-11-09T08:27:13Z

Ok, this helps in narrowing down the issue - it doesn't seem to be in spacy-llm per se, but with transformers or torch. I recommend to

look into how to solve this issue with this snippet on your machine first (then using spacy-llm with Llama 2 should work fine) or
run this on a machine with a GPU as a workaround.

We'll also look into this, but seeing that this is an upstream issue, there's probably not much we can do to fix this from our side.

rkatriel · 2023-11-09T23:02:04Z

Thanks, Raphael. I solved the mystery by following advice found on the pytorch discussion group at

https://discuss.pytorch.org/t/torch-embedding-fails-with-runtimeerror-placeholder-storage-has-not-been-allocated-on-mps-device/152124/2

Specifically, one must make sure - in addition to moving the model to the mps device - to also move the input to the mps device

inputs = tokenizer(prompt, return_tensors="pt").to("mps")

Here is the updated code snippet

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto'
)

mps_device = torch.device("mps")

# Move your model to mps just like any other device
model.to(mps_device)

prompt = 'Q: What is the largest animal?\nA:'

# You also need to move your input to mps device!
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
input_ids = inputs.input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))

The code now runs cleanly and produces the following output

tensor([[    1,  1029, 29537,  1200,   325,   268,  5242,  6848, 29584,    13,
         29530, 29537]], device='mps:0')
<s>Q: What is the largest animal?
A: The largest animal is the blue whale.
Q: What is the smallest animal?
A: The smallest animal is the dwarf chameleon

Note that prior to this I reinstalled torch using the following to take advantage of the Apple Silicon (M2 chip/GPU)

pip3 install -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

It may be part of the solution but I'm not sure (it wasn't enough by itself to make the error go away).

Question: How does one apply this to using spacy-llm with Llama 2 on a Mac with Apple silicon? Torch is internal to spacy-llm, that is, it's called under the covers by the nlp pipeline.

rmitsch · 2023-11-13T18:28:53Z

Thanks for the update! spacy-llm v0.6.3 is out and should fix some transformers-related device binding issue. I recommend updating the library and giving it another try. Would appreciate if you reported back on whether that helps 🙂

rkatriel · 2023-11-13T23:20:16Z

Thanks, that did the trick! Just the following warning which I have been getting all along

UserWarning: Couldn't find a CUDA GPU, so the setting 'device_map:auto' will be used, which may result in the LLM being loaded (partly) on the CPU or even the hard disk, which may be slow. Install cuda to be able to load and run the LLM on the GPU instead.

However, no entities were found by the code example listed above, which is unexpected.

rmitsch · 2023-11-14T08:16:25Z

However, no entities were found by the code example listed above, which is unexpected.

Feel free to open another issue for this, we'll look into it.

rkatriel · 2023-11-17T00:47:00Z

Done. See issue #13132 ("Spacy-LLM code sample produces no output").

github-actions · 2023-11-25T00:05:43Z

This issue has been automatically closed because it was answered and there was no follow-up discussion.

github-actions · 2023-12-26T00:02:04Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

rkatriel mentioned this issue Oct 31, 2023

[E892] Unknown function registry: 'llm_backends' #12987

Closed

rmitsch added bug Bugs and behaviour differing from documentation models Issues related to the statistical models feat/llm Feature: LLMs (incl. spacy-llm) and removed models Issues related to the statistical models labels Nov 2, 2023

rkatriel mentioned this issue Nov 17, 2023

Spacy-LLM code sample produces no output #13132

Open

adrianeboyd added the resolved The issue was addressed / answered label Nov 17, 2023

github-actions bot closed this as completed Nov 25, 2023

github-actions bot removed the resolved The issue was addressed / answered label Nov 25, 2023

github-actions bot locked as resolved and limited conversation to collaborators Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spacy-LLM fails with storage not allocated on MPS device #13096

Spacy-LLM fails with storage not allocated on MPS device #13096

rkatriel commented Oct 31, 2023

rmitsch commented Nov 2, 2023

rkatriel commented Nov 2, 2023 •

edited

rmitsch commented Nov 3, 2023

rkatriel commented Nov 3, 2023 •

edited

rmitsch commented Nov 6, 2023

rkatriel commented Nov 6, 2023 •

edited

rmitsch commented Nov 7, 2023

rkatriel commented Nov 7, 2023 •

edited

rmitsch commented Nov 8, 2023

rkatriel commented Nov 8, 2023

rmitsch commented Nov 9, 2023 •

edited

rkatriel commented Nov 9, 2023 •

edited

rmitsch commented Nov 13, 2023

rkatriel commented Nov 13, 2023 •

edited

rmitsch commented Nov 14, 2023

rkatriel commented Nov 17, 2023

github-actions bot commented Nov 25, 2023

github-actions bot commented Dec 26, 2023

Spacy-LLM fails with storage not allocated on MPS device #13096

Spacy-LLM fails with storage not allocated on MPS device #13096

Comments

rkatriel commented Oct 31, 2023

How to reproduce the behaviour

Your Environment

rmitsch commented Nov 2, 2023

rkatriel commented Nov 2, 2023 • edited

rmitsch commented Nov 3, 2023

rkatriel commented Nov 3, 2023 • edited

rmitsch commented Nov 6, 2023

rkatriel commented Nov 6, 2023 • edited

rmitsch commented Nov 7, 2023

rkatriel commented Nov 7, 2023 • edited

rmitsch commented Nov 8, 2023

rkatriel commented Nov 8, 2023

rmitsch commented Nov 9, 2023 • edited

rkatriel commented Nov 9, 2023 • edited

rmitsch commented Nov 13, 2023

rkatriel commented Nov 13, 2023 • edited

rmitsch commented Nov 14, 2023

rkatriel commented Nov 17, 2023

github-actions bot commented Nov 25, 2023

github-actions bot commented Dec 26, 2023

rkatriel commented Nov 2, 2023 •

edited

rkatriel commented Nov 3, 2023 •

edited

rkatriel commented Nov 6, 2023 •

edited

rkatriel commented Nov 7, 2023 •

edited

rmitsch commented Nov 9, 2023 •

edited

rkatriel commented Nov 9, 2023 •

edited

rkatriel commented Nov 13, 2023 •

edited