Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None of examples on README page works #1117

Open
olegmikul opened this issue Jan 6, 2024 · 5 comments
Open

None of examples on README page works #1117

olegmikul opened this issue Jan 6, 2024 · 5 comments
Assignees

Comments

@olegmikul
Copy link

Same errors on 3 different Linux distros.

I have installed from source:
pushd intel-extension-for-transformers/
pip install -r requirements.txt
python setup.py install

Then start to try examples from README (obviously, my first steps after install):

  1. Chatbot - a lot of missing dependencies, figured names running from errors and installed one by one
    pip install uvicorn
    pip install yacs
    pip install fastapi
    pip install shortuuid
    pip install python-multipart
    pip install python-dotenv

And finally got the following error:
from intel_extension_for_transformers.neural_chat import build_chatbot
PydanticImportError: BaseSettings has been moved to the pydantic-settings package. See https://docs.pydantic.dev/2.5/migration/#basesettings-has-moved-to-pydantic-settings for more details.

  1. INT4 Inference (CPU only)

from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neural-chat-7b-v3-1" # Hugging Face model_id or local model
prompt = "Once upon a time, there existed a little girl,"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)

ModuleNotFoundError: No module named 'intel_extension_for_transformers.llm.runtime.graph.mistral_cpp'

  1. INT8 Inference (CPU only) - same error
@lvliang-intel
Copy link
Collaborator

Hi @olegmikul,
To resolve the Chatbot issue, you'll need to install an additional requirements.txt file located at intel_extension_for_transformers/neural_chat/requirements_cpu.txt before running the chatbot.

For the INT4 Inference issue, please execute pip install intel-extension-for-transformers or perform a source code installation using pip install -e . within the intel_extension_for_transformers directory.

@olegmikul
Copy link
Author

hi, @lvliang-intel,

Thanks, it is partially helps:

I. Chatbot

  1. On my Linux (Archlinux) system with GPU and CUDA chatbot works (I need to install both requirements and requirements_cpu) to make it working
  2. On my another Linux system (same Archlinux OS) without GPU/CUDA chatbot doesn't work:
    ...
    In [4]: chatbot = build_chatbot()
    2024-01-09 23:09:10 [ERROR] neuralchat error: System has run out of storage
  3. On my laptop (Ultra7 155H, meteor lake, linux, Ubunta & Archlinux) it doesn't work (and Yes, I've installed intel-extension-for-transformers by both ways):
    In [4]: chatbot = build_chatbot()
    Loading model Intel/neural-chat-7b-v3-1
    model.safetensors.index.json: 100%|████████| 25.1k/25.1k [00:00<00:00, 77.8MB/s]
    model-00001-of-00002.safetensors: 100%|█████| 9.94G/9.94G [01:33<00:00, 106MB/s]
    model-00002-of-00002.safetensors: 100%|████| 4.54G/4.54G [00:55<00:00, 81.5MB/s]
    Downloading shards: 100%|█████████████████████████| 2/2 [02:29<00:00, 74.80s/it]
    Loading checkpoint shards: 100%|██████████████████| 2/2 [00:03<00:00, 1.91s/it]
    generation_config.json: 100%|███████████████████| 111/111 [00:00<00:00, 753kB/s]
    2024-01-09 20:04:11 [ERROR] neuralchat error: Generic error
    ...

II. Inference int* same error everywhere:
...
FileNotFoundError: [Errno 2] No such file or directory: 'Intel/neural-chat-7b-v3-1'

AssertionError Traceback (most recent call last)
Cell In[12], line 1
----> 1 model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)

File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:173, in _BaseQBitsAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
170 from intel_extension_for_transformers.llm.runtime.graph import Model
172 model = Model()
--> 173 model.init(
174 pretrained_model_name_or_path,
175 weight_dtype=quantization_config.weight_dtype,
176 alg=quantization_config.scheme,
177 group_size=quantization_config.group_size,
178 scale_dtype=quantization_config.scale_dtype,
179 compute_dtype=quantization_config.compute_dtype,
180 use_ggml=quantization_config.use_ggml,
181 use_quant=quantization_config.use_quant,
182 use_gptq=quantization_config.use_gptq,
183 )
184 return model
185 else:

File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/llm/runtime/graph/init.py:118, in Model.init(self, model_name, use_quant, use_gptq, **quant_kwargs)
116 if not os.path.exists(fp32_bin):
117 convert_model(model_name, fp32_bin, "f32")
--> 118 assert os.path.exists(fp32_bin), "Fail to convert pytorch model"
120 if not use_quant:
121 print("FP32 model will be used.")

AssertionError: Fail to convert pytorch model
...

@Tuanshu
Copy link

Tuanshu commented Jan 12, 2024

I have just tried the "INT4 Inference (CPU only)" example.
It seems that:

if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated).
the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).

in the later runs, the model_name works ok.

I wandor if this is supposed behavior.

@a32543254
Copy link
Contributor

a32543254 commented Jan 12, 2024

I have just tried the "INT4 Inference (CPU only)" example. It seems that:

if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated). the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).

in the later runs, the model_name works ok.

I wandor if this is supposed behavior.

Yes, for Intel/neural-chat-7b-v3-1, we need first download the model to disk, then pass local path to us.
and only llama/ mistral/ neural chat model need do this process. other model should be ok to just fill with HF model id.

And we will support them without use local path soon.

@olegmikul
Copy link
Author

Hi, @Tuanshu,

Thanks, it works! Read a poem on a little girl that can see :)

@a32543254 , @lvliang-intel

It would be extremely useful to put necessary details in a README file to avoid questions from newcomers, like me.

Chatbot issues are remaining, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants