None of examples on README page works #1117

olegmikul · 2024-01-06T04:21:33Z

Same errors on 3 different Linux distros.

I have installed from source:
pushd intel-extension-for-transformers/
pip install -r requirements.txt
python setup.py install

Then start to try examples from README (obviously, my first steps after install):

Chatbot - a lot of missing dependencies, figured names running from errors and installed one by one
pip install uvicorn
pip install yacs
pip install fastapi
pip install shortuuid
pip install python-multipart
pip install python-dotenv

And finally got the following error:
from intel_extension_for_transformers.neural_chat import build_chatbot
PydanticImportError: BaseSettings has been moved to the pydantic-settings package. See https://docs.pydantic.dev/2.5/migration/#basesettings-has-moved-to-pydantic-settings for more details.

INT4 Inference (CPU only)

from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neural-chat-7b-v3-1" # Hugging Face model_id or local model
prompt = "Once upon a time, there existed a little girl,"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)

ModuleNotFoundError: No module named 'intel_extension_for_transformers.llm.runtime.graph.mistral_cpp'

INT8 Inference (CPU only) - same error

The text was updated successfully, but these errors were encountered:

lvliang-intel · 2024-01-08T15:16:09Z

Hi @olegmikul,
To resolve the Chatbot issue, you'll need to install an additional requirements.txt file located at intel_extension_for_transformers/neural_chat/requirements_cpu.txt before running the chatbot.

For the INT4 Inference issue, please execute pip install intel-extension-for-transformers or perform a source code installation using pip install -e . within the intel_extension_for_transformers directory.

olegmikul · 2024-01-12T01:50:23Z

hi, @lvliang-intel,

Thanks, it is partially helps:

I. Chatbot

On my Linux (Archlinux) system with GPU and CUDA chatbot works (I need to install both requirements and requirements_cpu) to make it working
On my another Linux system (same Archlinux OS) without GPU/CUDA chatbot doesn't work:
...
In [4]: chatbot = build_chatbot()
2024-01-09 23:09:10 [ERROR] neuralchat error: System has run out of storage
On my laptop (Ultra7 155H, meteor lake, linux, Ubunta & Archlinux) it doesn't work (and Yes, I've installed intel-extension-for-transformers by both ways):
In [4]: chatbot = build_chatbot()
Loading model Intel/neural-chat-7b-v3-1
model.safetensors.index.json: 100%|████████| 25.1k/25.1k [00:00<00:00, 77.8MB/s]
model-00001-of-00002.safetensors: 100%|█████| 9.94G/9.94G [01:33<00:00, 106MB/s]
model-00002-of-00002.safetensors: 100%|████| 4.54G/4.54G [00:55<00:00, 81.5MB/s]
Downloading shards: 100%|█████████████████████████| 2/2 [02:29<00:00, 74.80s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:03<00:00, 1.91s/it]
generation_config.json: 100%|███████████████████| 111/111 [00:00<00:00, 753kB/s]
2024-01-09 20:04:11 [ERROR] neuralchat error: Generic error
...

II. Inference int* same error everywhere:
...
FileNotFoundError: [Errno 2] No such file or directory: 'Intel/neural-chat-7b-v3-1'

AssertionError Traceback (most recent call last)
Cell In[12], line 1
----> 1 model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)

File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:173, in _BaseQBitsAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
170 from intel_extension_for_transformers.llm.runtime.graph import Model
172 model = Model()
--> 173 model.init(
174 pretrained_model_name_or_path,
175 weight_dtype=quantization_config.weight_dtype,
176 alg=quantization_config.scheme,
177 group_size=quantization_config.group_size,
178 scale_dtype=quantization_config.scale_dtype,
179 compute_dtype=quantization_config.compute_dtype,
180 use_ggml=quantization_config.use_ggml,
181 use_quant=quantization_config.use_quant,
182 use_gptq=quantization_config.use_gptq,
183 )
184 return model
185 else:

File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/llm/runtime/graph/init.py:118, in Model.init(self, model_name, use_quant, use_gptq, **quant_kwargs)
116 if not os.path.exists(fp32_bin):
117 convert_model(model_name, fp32_bin, "f32")
--> 118 assert os.path.exists(fp32_bin), "Fail to convert pytorch model"
120 if not use_quant:
121 print("FP32 model will be used.")

AssertionError: Fail to convert pytorch model
...

Tuanshu · 2024-01-12T01:58:50Z

I have just tried the "INT4 Inference (CPU only)" example.
It seems that:

if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated).
the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).

in the later runs, the model_name works ok.

I wandor if this is supposed behavior.

a32543254 · 2024-01-12T02:13:45Z

I have just tried the "INT4 Inference (CPU only)" example. It seems that:

if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated). the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).

in the later runs, the model_name works ok.

I wandor if this is supposed behavior.

Yes, for Intel/neural-chat-7b-v3-1, we need first download the model to disk, then pass local path to us.
and only llama/ mistral/ neural chat model need do this process. other model should be ok to just fill with HF model id.

And we will support them without use local path soon.

olegmikul · 2024-01-12T05:31:08Z

Hi, @Tuanshu,

Thanks, it works! Read a poem on a little girl that can see :)

@a32543254 , @lvliang-intel

It would be extremely useful to put necessary details in a README file to avoid questions from newcomers, like me.

Chatbot issues are remaining, though...

kevinintel assigned lvliang-intel Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

None of examples on README page works #1117

None of examples on README page works #1117

olegmikul commented Jan 6, 2024

lvliang-intel commented Jan 8, 2024

olegmikul commented Jan 12, 2024

Tuanshu commented Jan 12, 2024

a32543254 commented Jan 12, 2024 •

edited

olegmikul commented Jan 12, 2024

None of examples on README page works #1117

None of examples on README page works #1117

Comments

olegmikul commented Jan 6, 2024

lvliang-intel commented Jan 8, 2024

olegmikul commented Jan 12, 2024

II. Inference int* same error everywhere: ... FileNotFoundError: [Errno 2] No such file or directory: 'Intel/neural-chat-7b-v3-1'

Tuanshu commented Jan 12, 2024

a32543254 commented Jan 12, 2024 • edited

olegmikul commented Jan 12, 2024

II. Inference int* same error everywhere:
...
FileNotFoundError: [Errno 2] No such file or directory: 'Intel/neural-chat-7b-v3-1'

a32543254 commented Jan 12, 2024 •

edited