Cannot use the DirectML packages to run on CPU in Windows App #430

natke · 2024-05-10T00:02:28Z

When I try to run the Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4 model with the DirectML package I get this error in generator.ComputeLogits().
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'Non-zero status code returned while running Expand node. Name:'/model/attn_mask_reformat/input_ids_subgraph/Expand' Status Message: invalid expand shape'

Discussed in #425

^{Originally posted by AshD May 9, 2024}
Background: Fusion Quill is a Windows AI Word processor and Chat app on the Microsoft Store. It currently uses llama.cpp to support multiple AI models and switches between using CUDA, ROC and CPU llama.cpp dlls depending on what the end user's PC capabilities.

How do I switch between using DirectML and CPU GenAI packages at runtime. If the user has a GPU, I want to use the Microsoft.ML.OnnxRuntimeGenAI.DirectML package with the corresponding DirectML model and if the user does not have a GPU, I want to the the Microsoft.ML.OnnxRuntimeGenAI package with the CPU version of the model.

Thanks,
Ash

AshD · 2024-05-14T20:37:40Z

Any update on this? Thanks.

natke · 2024-05-14T22:17:01Z

Hi @AshD, I just verified that I could run the CPU version of Phi-3 with the DirectML NuGet package. This was with 0.2.0-rc7, which is hot off the press. Do you want to confirm that this works for you too?

https://www.nuget.org/packages/Microsoft.ML.OnnxRuntimeGenAI.DirectML/0.2.0-rc7

AshD · 2024-05-14T22:50:45Z

Hi @natke I am getting the same error after upgrading to rc7. Using the Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4 model

Is there some settings that need to be set? Thanks.

natke · 2024-05-14T23:24:54Z

This is really weird. I just downloaded that model and ran with rc7 nuget and it works.

Can you list your package dependencies here please?

AshD · 2024-05-15T13:51:10Z

Tried different options and looks like this is the issue
generatorParams.SetSearchOption("past_present_share_buffer", false);

It has to be false for CPU and true for DML. And it works for both CPU and DML models :-)

We can close the issue. Is there an API to check if there is a DirectML device present?

natke · 2024-05-21T18:55:52Z

hi @AshD, closing this one and opened a new one about the API #488

natke assigned PatriceVignola May 10, 2024

natke mentioned this issue May 21, 2024

Device API throws exception #488

Open

natke closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use the DirectML packages to run on CPU in Windows App #430

Cannot use the DirectML packages to run on CPU in Windows App #430

natke commented May 10, 2024 •

edited

AshD commented May 14, 2024

natke commented May 14, 2024 •

edited

AshD commented May 14, 2024

natke commented May 14, 2024

AshD commented May 15, 2024

natke commented May 21, 2024

Cannot use the DirectML packages to run on CPU in Windows App #430

Cannot use the DirectML packages to run on CPU in Windows App #430

Comments

natke commented May 10, 2024 • edited

Discussed in #425

AshD commented May 14, 2024

natke commented May 14, 2024 • edited

AshD commented May 14, 2024

natke commented May 14, 2024

AshD commented May 15, 2024

natke commented May 21, 2024

natke commented May 10, 2024 •

edited

natke commented May 14, 2024 •

edited