Model for llama-3-8B, EP: cpu, precision: int4 generated using onnxruntime-genai/src/python/py/models/builder.py has issues #462

jmopuri · 2024-05-15T18:38:56Z

I have used llama-3-8B on hugging-face model to generate ONNX model using builder.py in onnxruntime-genai. When I try use that model and run onnxruntime-genai\examples\python, I am getting this error:

onnxruntime_genai.onnxruntime_genai.OrtException: Load model from .\llama-3-8B_cpu_int4\model.onnx failed:Invalid model. Node input '/model/layers.0/attn/k_proj/repeat_kv/Transpose_2/output_0' is not a graph input, initializer, or output of a previous node.

kunal-vaishnavi · 2024-05-15T21:22:48Z

This is the same error as this issue. Can you try the steps specified there? Alternatively, you can upgrade to the latest RC version (0.2.0rc7) once it is released and try again.

jmopuri · 2024-05-15T23:58:02Z

In the earlier case (issue #459), I have generated Gemma model using onnxruntime_genai.models.builder. Because that is giving error and you suggested doing onnxruntime-genai/src/python/py/models/builder.py, that fixed that issue, I have used your suggestion. This is for Llama. It is giving the error reported.

kunal-vaishnavi · 2024-05-17T04:37:10Z

The fix is the same as the linked issue. If you substitute google/gemma-2b-it with meta-llama/Meta-Llama-3-8B and re-generate the ONNX model using the "from source" version, the error should go away.

Example:

# Your original command (i.e. "from wheel" version):
$ python3 -m onnxruntime_genai.models.builder -m meta-llama/Meta-Llama-3-8B -o ./llama3_8b -p int4 -e cpu

# New command to try (i.e. "from source" version):
$ git clone https://github.com/microsoft/onnxruntime-genai
$ cd onnxruntime-genai/src/python/py/models/
$ python3 builder.py -m meta-llama/Meta-Llama-3-8B -o ./llama3_8b -p int4 -e cpu

With the version of ONNX Runtime GenAI that you have installed, this error will keep happening when using the "from wheel" command to generate any INT4 CPU model where num_attention_heads != num_key_value_heads. The error has been fixed by this PR. You can upgrade your ONNX Runtime GenAI version and try the "from wheel" command again.

natke · 2024-05-21T18:17:59Z

@jmopuri Can you try with onnxruntime-genai version 0.2.0?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model for llama-3-8B, EP: cpu, precision: int4 generated using onnxruntime-genai/src/python/py/models/builder.py has issues #462

Model for llama-3-8B, EP: cpu, precision: int4 generated using onnxruntime-genai/src/python/py/models/builder.py has issues #462

jmopuri commented May 15, 2024

kunal-vaishnavi commented May 15, 2024 •

edited

jmopuri commented May 15, 2024

kunal-vaishnavi commented May 17, 2024 •

edited

natke commented May 21, 2024

Model for llama-3-8B, EP: cpu, precision: int4 generated using onnxruntime-genai/src/python/py/models/builder.py has issues #462

Model for llama-3-8B, EP: cpu, precision: int4 generated using onnxruntime-genai/src/python/py/models/builder.py has issues #462

Comments

jmopuri commented May 15, 2024

kunal-vaishnavi commented May 15, 2024 • edited

jmopuri commented May 15, 2024

kunal-vaishnavi commented May 17, 2024 • edited

natke commented May 21, 2024

kunal-vaishnavi commented May 15, 2024 •

edited

kunal-vaishnavi commented May 17, 2024 •

edited