Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacy-LLM fails with storage not allocated on MPS device #13096

Closed
rkatriel opened this issue Oct 31, 2023 · 18 comments
Closed

Spacy-LLM fails with storage not allocated on MPS device #13096

rkatriel opened this issue Oct 31, 2023 · 18 comments
Labels
bug Bugs and behaviour differing from documentation feat/llm Feature: LLMs (incl. spacy-llm)

Comments

@rkatriel
Copy link

Hi,

The code example listed below fails with the following error:

RuntimeError: Placeholder storage has not been allocated on MPS device!

I'm running it on a MacBook Air with Apple Silicon (M2, 2022) under macOS Monterey (Version 12.6). Additional details below.

The full traceback is listed below the code.

Note: This is a continuation of Issue #12987 (Unknown function registry: 'llm_backends').

How to reproduce the behaviour

Here is the code, based on the example provided in Matthew Honnibal's blog "Against LLM maximalism" (https://explosion.ai/blog/against-llm-maximalism):

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {
            "@llm_models": "spacy.OpenLLaMA.v1",
            "name": "open_llama_3b"
        },
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

Here is the traceback:

File "/Users/ron.katriel/PycharmProjects/Transformer/test-spacy-llm.py", line 19, in
doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1054, in call
error_handler(name, proc, [doc], e)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 1704, in raise_error
raise e
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1049, in call
doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 156, in call
docs = self._process_docs([doc])
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 210, in _process_docs
responses_iters = tee(self._model(prompts_iters[0]), n_iters)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 55, in call
return [
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 57, in
self._model.generate(input_ids=tii, **self._config_run)[
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
return self.greedy_search(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
outputs = self(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 875, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

Your Environment

  • Platform: macOS-12.6-arm64-arm-64bit
  • Python Version: 3.11.4
  • spaCy Version: 3.6.1
@rmitsch
Copy link
Contributor

rmitsch commented Nov 2, 2023

Hi @rkatriel, which version of transformers are you using?

@rmitsch rmitsch added bug Bugs and behaviour differing from documentation models Issues related to the statistical models feat/llm Feature: LLMs (incl. spacy-llm) and removed models Issues related to the statistical models labels Nov 2, 2023
@rkatriel
Copy link
Author

rkatriel commented Nov 2, 2023

Hi @rmitsch, it is 4.34.1. I upgraded to the latest (4.35.0) but it made no difference.

This issue is mentioned in the following discussion:

https://discuss.pytorch.org/t/torch-embedding-fails-with-runtimeerror-placeholder-storage-has-not-been-allocated-on-mps-device/152124

One suggestion is to check out the following:

https://pytorch.org/docs/master/notes/mps.html

Perhaps incorporating this into Spacy-LLM will resolve the error.

@rmitsch
Copy link
Contributor

rmitsch commented Nov 3, 2023

If you run this snippet from the linked PyTorch docs site, what's your output?

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

@rkatriel
Copy link
Author

rkatriel commented Nov 3, 2023

Nothing. This is because the first test fails. Here are the values of the function calls:

>>> torch.backends.mps.is_available()
True
>>> torch.backends.mps.is_built()
True

Both are False when the 'not' is added in the conditions.

@rmitsch
Copy link
Contributor

rmitsch commented Nov 6, 2023

Two approaches here:

  • Try downgrading transformers and see if that makes a difference.
  • Try instantiating a Hugging Face model without spacy-llm and move it to your MPS device. If that works, we know it's an issue with spacy-llm (and we just can't reproduce it).

@rkatriel
Copy link
Author

rkatriel commented Nov 6, 2023

Thanks for the suggestions. Downgrading transformers doesn't help. I got the same error with transformers-4.28.0, while transformers-4.27.0 causes the following error:

ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

For your second suggestion, see the script below. The Huggingface model is documented here:

https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT

The code works well when the model is not moved to my MPS device, that is, the following line is commented out

#model.to(mps_device)

But when uncommented I get the same error as with Spacy-LLM:

RuntimeError: Placeholder storage has not been allocated on MPS device!

The error occurs inside torch (I can share the traceback if helpful).

Ron

# Check that MPS is available
import torch

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    mps_device = torch.device("mps")

    # Create a Tensor directly on the mps device
    x = torch.ones(5, device=mps_device) # alternative is torch.ones(5, device="mps")

    # Any operation happens on the GPU
    y = x * 2

    from transformers import AutoTokenizer, pipeline,  AutoModelForTokenClassification
    tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
    model = AutoModelForTokenClassification.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

    # Move your model to mps just like any other device
    model.to(mps_device)
    
    nerpipeline = pipeline('ner', model=model, tokenizer=tokenizer)
    text = "Eastern Cooperative Oncology Group performance status ≤ 2"
    result = nerpipeline(text)
    print(result)


@rmitsch
Copy link
Contributor

rmitsch commented Nov 7, 2023

Try passing device to your pipeline() call?

I should have been more specific w.r.t. point 2. I meant to instantiate the model you're using in spacy-llm, OpenLLaMa, directly with transformers. You can copy-paste the snippet here and modify it to use your MPS device. Let me know if that works.

@rkatriel
Copy link
Author

rkatriel commented Nov 7, 2023

In reverse order, I couldn't figure out how to modify the code snippet you referenced to use my MPS device. Can you provide more explicit instructions? Specifically, which function accepts the device.

However, I was able to pass the MPS device in my original code as follows and it seemed to work (output produced and no errors):

    from transformers import AutoTokenizer, pipeline,  AutoModelForTokenClassification

    mps_device = torch.device("mps")
    use_mps_device = True

    model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
    start_time = time.time()
    if use_mps_device:
        recognizer = pipeline("ner", model=model, tokenizer=tokenizer, device=mps_device)
    else:
        recognizer = pipeline("ner", model=model, tokenizer=tokenizer)
    print('runtime = %f seconds' % (time.time() - start_time))
    result = recognizer('My name is Sarah and I live in London')
    print(result)

However, the strange thing is that the code is significantly faster when not using the MPS device (0.000060 seconds) than when using it (0.256201 seconds). This doesn't make any sense to me...

@rmitsch
Copy link
Contributor

rmitsch commented Nov 8, 2023

However, the strange thing is that the code is significantly faster when not using the MPS device (0.000060 seconds) than when using it (0.256201 seconds). This doesn't make any sense to me...

Offloading data/model to a device comes with an overhead. It should be faster in the long run (i. e. for many calls).

In reverse order, I couldn't figure out how to modify the code snippet you referenced to use my MPS device. Can you provide more explicit instructions?

(1) Copy-paste the linked snippet. (2) Do model.to(mps_device) (after declaring mps_device = torch.device("mps") before) after the model was initiated.

@rkatriel
Copy link
Author

rkatriel commented Nov 8, 2023

That doesn't work. I get the same error

RuntimeError: Placeholder storage has not been allocated on MPS device!

Here is the code I'm running

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto'
)

# Move your model to mps just like any other device
mps_device = torch.device("mps")
model.to(mps_device)

prompt = 'Q: What is the largest animal?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))

@rmitsch
Copy link
Contributor

rmitsch commented Nov 9, 2023

Ok, this helps in narrowing down the issue - it doesn't seem to be in spacy-llm per se, but with transformers or torch. I recommend to

  • look into how to solve this issue with this snippet on your machine first (then using spacy-llm with Llama 2 should work fine) or
  • run this on a machine with a GPU as a workaround.

We'll also look into this, but seeing that this is an upstream issue, there's probably not much we can do to fix this from our side.

@rkatriel
Copy link
Author

rkatriel commented Nov 9, 2023

Thanks, Raphael. I solved the mystery by following advice found on the pytorch discussion group at

https://discuss.pytorch.org/t/torch-embedding-fails-with-runtimeerror-placeholder-storage-has-not-been-allocated-on-mps-device/152124/2

Specifically, one must make sure - in addition to moving the model to the mps device - to also move the input to the mps device

inputs = tokenizer(prompt, return_tensors="pt").to("mps")

Here is the updated code snippet

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto'
)

mps_device = torch.device("mps")

# Move your model to mps just like any other device
model.to(mps_device)

prompt = 'Q: What is the largest animal?\nA:'

# You also need to move your input to mps device!
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
input_ids = inputs.input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))

The code now runs cleanly and produces the following output

tensor([[    1,  1029, 29537,  1200,   325,   268,  5242,  6848, 29584,    13,
         29530, 29537]], device='mps:0')
<s>Q: What is the largest animal?
A: The largest animal is the blue whale.
Q: What is the smallest animal?
A: The smallest animal is the dwarf chameleon

Note that prior to this I reinstalled torch using the following to take advantage of the Apple Silicon (M2 chip/GPU)

pip3 install -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

It may be part of the solution but I'm not sure (it wasn't enough by itself to make the error go away).

Question: How does one apply this to using spacy-llm with Llama 2 on a Mac with Apple silicon? Torch is internal to spacy-llm, that is, it's called under the covers by the nlp pipeline.

@rmitsch
Copy link
Contributor

rmitsch commented Nov 13, 2023

Thanks for the update! spacy-llm v0.6.3 is out and should fix some transformers-related device binding issue. I recommend updating the library and giving it another try. Would appreciate if you reported back on whether that helps 🙂

@rkatriel
Copy link
Author

rkatriel commented Nov 13, 2023

Thanks, that did the trick! Just the following warning which I have been getting all along

UserWarning: Couldn't find a CUDA GPU, so the setting 'device_map:auto' will be used, which may result in the LLM being loaded (partly) on the CPU or even the hard disk, which may be slow. Install cuda to be able to load and run the LLM on the GPU instead.

However, no entities were found by the code example listed above, which is unexpected.

@rmitsch
Copy link
Contributor

rmitsch commented Nov 14, 2023

However, no entities were found by the code example listed above, which is unexpected.

Feel free to open another issue for this, we'll look into it.

@rkatriel
Copy link
Author

Done. See issue #13132 ("Spacy-LLM code sample produces no output").

@adrianeboyd adrianeboyd added the resolved The issue was addressed / answered label Nov 17, 2023
Copy link
Contributor

This issue has been automatically closed because it was answered and there was no follow-up discussion.

@github-actions github-actions bot removed the resolved The issue was addressed / answered label Nov 25, 2023
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat/llm Feature: LLMs (incl. spacy-llm)
Projects
None yet
Development

No branches or pull requests

3 participants