Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When loading the model I get the following error: #17

Open
JeisonJimenezA opened this issue Oct 11, 2023 · 9 comments
Open

When loading the model I get the following error: #17

JeisonJimenezA opened this issue Oct 11, 2023 · 9 comments

Comments

@JeisonJimenezA
Copy link

llm_load_tensors: ggml ctx size = 0.16 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 9363.40 MB
llm_load_tensors: offloading 6 repeating layers to GPU
llm_load_tensors: offloaded 6/43 layers to GPU
llm_load_tensors: VRAM used: 1637.37 MB
.................................................................................GGML_ASSERT: D:\a\llama-cpp-python-cuBLAS-wheels\llama-cpp-python-cuBLAS-wheels\vendor\llama.cpp\ggml-cuda.cu:5925: false

@jllllll
Copy link
Owner

jllllll commented Oct 11, 2023

What model are you trying to load? This error is indicative of an incompatible model.

@JeisonJimenezA
Copy link
Author

I'm loading this model ----> TheBloke/sqlcoder-GGUF

@jllllll
Copy link
Owner

jllllll commented Oct 11, 2023

What version of llama-cpp-python are you using?

@JeisonJimenezA
Copy link
Author

llama_cpp_python 0.2.11+cu117

@jllllll
Copy link
Owner

jllllll commented Oct 12, 2023

Yeah, just finished downloading and got the same error. There may be something wrong with the model.
Not much I can do on my end as I'm only building wheels for llama-cpp-python. As far as I can tell, the issue is with llama.cpp itself.

@jllllll
Copy link
Owner

jllllll commented Oct 12, 2023

Could simply be that StarCoder models aren't supported with CUDA? Not sure.
I do know that only some model types are supported by the cuBLAS implementation.

@jllllll
Copy link
Owner

jllllll commented Oct 12, 2023

This does seem to be the case: ggerganov/llama.cpp#3187 (comment)
Guess CUDA support for it just hasn't been done yet.

@JeisonJimenezA
Copy link
Author

Thank you for your help. Where can I see supported models for CUDA?

@jllllll
Copy link
Owner

jllllll commented Oct 12, 2023

Only thing I can find so far is this in the source code:
https://github.com/ggerganov/llama.cpp/blob/b8fe4b5cc9cb237ca98e5bc51b5d189e3c446d13/llama.cpp#L5840-L5844

The REFACT and MPT entries are new model arch support that isn't present yet in the current version of llama-cpp-python.
That leaves current llama-cpp-python cuBLAS support at these models:

LLAMA
BAICHUAN
FALCON

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants