ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 2483060736 Segmentation fault (core dumped) #3392
jheinrichs79
started this conversation in
General
Replies: 2 comments 3 replies
-
What's the free VRAM when you run this command? It just showed that you don't have enough VRAM to load the model. |
Beta Was this translation helpful? Give feedback.
2 replies
-
It is running out of memory on the GPU. Perhaps you can reduce the context size like this:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I tried this command in Ubunutu linux and it errors out. I am not sure how to troubleshoot this.
Command:
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q5_k_m.gguf llama-simple.wasm --prompt "### Human: What is Mercury? ### Assistant: "
Run Info:
[INFO] prompt context size: 4096
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 4096
[INFO] Log enable: false
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2368.03 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 2483060736
Segmentation fault (core dumped)
Beta Was this translation helpful? Give feedback.
All reactions