Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model for llama-3-8B, EP: cpu, precision: int4 generated using onnxruntime-genai/src/python/py/models/builder.py has issues #462

Open
jmopuri opened this issue May 15, 2024 · 4 comments

Comments

@jmopuri
Copy link

jmopuri commented May 15, 2024

I have used llama-3-8B on hugging-face model to generate ONNX model using builder.py in onnxruntime-genai. When I try use that model and run onnxruntime-genai\examples\python, I am getting this error:

onnxruntime_genai.onnxruntime_genai.OrtException: Load model from .\llama-3-8B_cpu_int4\model.onnx failed:Invalid model. Node input '/model/layers.0/attn/k_proj/repeat_kv/Transpose_2/output_0' is not a graph input, initializer, or output of a previous node.

@kunal-vaishnavi
Copy link
Contributor

kunal-vaishnavi commented May 15, 2024

This is the same error as this issue. Can you try the steps specified there? Alternatively, you can upgrade to the latest RC version (0.2.0rc7) once it is released and try again.

@jmopuri
Copy link
Author

jmopuri commented May 15, 2024

In the earlier case (issue #459), I have generated Gemma model using onnxruntime_genai.models.builder. Because that is giving error and you suggested doing onnxruntime-genai/src/python/py/models/builder.py, that fixed that issue, I have used your suggestion. This is for Llama. It is giving the error reported.

@kunal-vaishnavi
Copy link
Contributor

kunal-vaishnavi commented May 17, 2024

The fix is the same as the linked issue. If you substitute google/gemma-2b-it with meta-llama/Meta-Llama-3-8B and re-generate the ONNX model using the "from source" version, the error should go away.

Example:

# Your original command (i.e. "from wheel" version):
$ python3 -m onnxruntime_genai.models.builder -m meta-llama/Meta-Llama-3-8B -o ./llama3_8b -p int4 -e cpu

# New command to try (i.e. "from source" version):
$ git clone https://github.com/microsoft/onnxruntime-genai
$ cd onnxruntime-genai/src/python/py/models/
$ python3 builder.py -m meta-llama/Meta-Llama-3-8B -o ./llama3_8b -p int4 -e cpu

With the version of ONNX Runtime GenAI that you have installed, this error will keep happening when using the "from wheel" command to generate any INT4 CPU model where num_attention_heads != num_key_value_heads. The error has been fixed by this PR. You can upgrade your ONNX Runtime GenAI version and try the "from wheel" command again.

@natke
Copy link
Contributor

natke commented May 21, 2024

@jmopuri Can you try with onnxruntime-genai version 0.2.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants