Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes #13569

Open
AnandUgale opened this issue May 18, 2024 · 3 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@AnandUgale
Copy link

Bug Description

ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes

Environment

Packages installed with CUDA 11.8:

torch - 2.3.0+cu118
llama-index - 0.10.37
llama-index-llms-huggingface - 0.2.0
transformers - 4.39.0
accelerate - 0.27.0
bitsandbytes - 0.43.1

Version

0.10.37

Steps to Reproduce

import torch
from llama_index.llms.huggingface import HuggingFaceLLM

Optional quantization to 4bit

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
model_kwargs={
"token": hf_token,
"torch_dtype": torch.bfloat16, # comment this line and uncomment below to use 4bit
# "quantization_config": quantization_config
},
generate_kwargs={
"do_sample": True,
"temperature": 0.6,
"top_p": 0.9,
},
tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_kwargs={"token": hf_token},
stopping_ids=stopping_ids,
)

Relevant Logs/Tracbacks

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
@AnandUgale AnandUgale added bug Something isn't working triage Issue needs to be triaged/prioritized labels May 18, 2024
Copy link

dosubot bot commented May 18, 2024

Hey @AnandUgale, great to run into you again on here! 🚀 It looks like you've stumbled upon an intriguing challenge. I'm diving into the details now and will circle back with a more comprehensive response soon. Stay tuned!

@logan-markewich
Copy link
Collaborator

This isn't really a llama index issue, it's a huggingface issue 😅 if you are in a notebook, you might have to restart your notebook after installing

Copy link

dosubot bot commented May 18, 2024

To address the ImportError you're encountering with bitsandbytes and 8-bit quantization, here are targeted steps to resolve the issue:

  1. Reinstall bitsandbytes for CUDA 11.8: Ensure bitsandbytes is correctly installed for your CUDA version by running:

    pip install bitsandbytes-cuda118

    This command installs the version compiled specifically for CUDA 11.8, which might resolve compatibility issues.

  2. Check Environment Variables: Verify that CUDA_HOME and LD_LIBRARY_PATH are correctly set to reflect your CUDA 11.8 installation. Misconfigured environment variables can lead to import errors.

  3. Isolate the ImportError: Determine exactly where the ImportError occurs. Is it during the import of a specific module from bitsandbytes or another package? Pinpointing the location can help identify the cause.

  4. Consult Documentation and Issues: Look into the GitHub repositories and documentation for bitsandbytes, llama_index, and HuggingFace's transformers for similar issues. Sometimes, specific version conflicts or bugs are documented.

  5. Environment Clean-Up: If necessary, create a new virtual environment and reinstall your dependencies. This can help eliminate conflicts or issues from previous installations.

For integrating bitsandbytes with HuggingFace's transformers, ensure your BitsAndBytesConfig and model initialization align with the provided example. This includes setting the quantization_config in model_kwargs when creating the HuggingFaceLLM instance. If these steps don't resolve the issue, please provide more details about the error and the code causing it for further assistance.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants