You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Greetings!
I am trying to use LLamaSharp.Backend.OpenCL Version 0.11.2 under Windows 11, but the after loading any GGUF model, inference fails with the following assertion: GGML_ASSERT: D:\a\LLamaSharp\LLamaSharp\llama.cpp:14093: hparams.n_embd_head_v % ggml_blck_size(type_v) == 0
I tried with several models (mistral-7b-instruct-v0.2.Q6_K.gguf, tiny-llama, phi-2, gemma-it) and with and without GPU offloading, but the error remains the same. The CPU backend works fine.
As far as I could see, there is an issue in llama.cpp (#5928) that at least sounds similar, but I can't tell if it is the same error.
Yes, I tested with llama.cpp release b2303 (commit 3ab8b3a) and the latest (binary) release b2589 (commit 1ff4d9f). Both work as expected and for all GGUF models.
However, I have noticed that with OpenCL mode, the RAM consumption is considerably higher and in particular the inference speed is considerably less than half as fast. 😞 So I think I'll stick with CPU only mode.
UHD Graphics 730 is not as efficient as some GPU for computation, like Nvidia RTX and AMD RX. However if everything works well with llama.cpp but not with LLamaSharp, it could be confirmed as a BUG. Could you please give a link of the model you used for us to reproduce it?
I had hoped that the iGPU would be around the same level of CPU performance, but more efficient (which on my PC actually means: much quieter 🤣). But the CUDA backend is definitely an order of magnitude faster, I agree.
Here are the links to the models I used for the test with llama.cpp native (unfortunately, mistral-7b Q6 works in CPU but runs OOM in OpenCL):
Greetings!
I am trying to use LLamaSharp.Backend.OpenCL Version 0.11.2 under Windows 11, but the after loading any GGUF model, inference fails with the following assertion:
GGML_ASSERT: D:\a\LLamaSharp\LLamaSharp\llama.cpp:14093: hparams.n_embd_head_v % ggml_blck_size(type_v) == 0
I tried with several models (mistral-7b-instruct-v0.2.Q6_K.gguf, tiny-llama, phi-2, gemma-it) and with and without GPU offloading, but the error remains the same. The CPU backend works fine.
As far as I could see, there is an issue in llama.cpp (#5928) that at least sounds similar, but I can't tell if it is the same error.
The full log (mistral, with GPU offloading):
The text was updated successfully, but these errors were encountered: