ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 2483060736 Segmentation fault (core dumped) #3392

jheinrichs79 · 2024-05-08T02:44:14Z

jheinrichs79
May 8, 2024

I tried this command in Ubunutu linux and it errors out. I am not sure how to troubleshoot this.

Command:
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q5_k_m.gguf llama-simple.wasm --prompt "### Human: What is Mercury? ### Assistant: "

Run Info:

[INFO] prompt context size: 4096
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 4096
[INFO] Log enable: false
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2368.03 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 2483060736
Segmentation fault (core dumped)

hydai · 2024-05-08T04:11:54Z

hydai
May 8, 2024
Maintainer

What's the free VRAM when you run this command? It just showed that you don't have enough VRAM to load the model.

2 replies

jheinrichs79 May 9, 2024
Author

LANG=C nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits

Output = 8192

Wed May  8 19:10:47 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2070        Off |   00000000:0A:00.0  On |                  N/A |
| 43%   49C    P0             48W /  175W |     769MiB /   8192MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1303      G   /usr/lib/xorg/Xorg                            394MiB |
|    0   N/A  N/A      1563      G   /usr/bin/gnome-shell                          107MiB |
|    0   N/A  N/A      2860      G   ...seed-version=20240507-050127.537000        123MiB |
|    0   N/A  N/A     47385      G   ...AAAAAAAACAAAAAAAAAA= --shared-files         36MiB |
+-----------------------------------------------------------------------------------------+

juntao May 9, 2024
Maintainer

Yeah, I do not think the 4096 context window can be fit into 7GB of VRAM.

juntao · 2024-05-08T04:28:38Z

juntao
May 8, 2024
Maintainer

It is running out of memory on the GPU. Perhaps you can reduce the context size like this:

wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q5_k_m.gguf llama-simple.wasm -c 256 -n 128 --prompt "### Human: What is Mercury? ### Assistant: "

1 reply

jheinrichs79 May 9, 2024
Author

Didn't seem to work :( I do have a 2070 RTX it should be more than good enough :)

wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q5_k_m.gguf llama-simple.wasm -c 256 -n 128 --prompt "### Human: What is Mercury? ### Assistant: "
[INFO] prompt context size: 256
[INFO] Number of tokens to predict: 128
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 4096
[INFO] Log enable: false
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2368.03 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 2483060736
Segmentation fault (core dumped)

Wed May  8 19:12:22 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2070        Off |   00000000:0A:00.0  On |                  N/A |
|  0%   49C    P0             48W /  175W |     849MiB /   8192MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1303      G   /usr/lib/xorg/Xorg                            394MiB |
|    0   N/A  N/A      1563      G   /usr/bin/gnome-shell                          107MiB |
|    0   N/A  N/A      2860      G   ...seed-version=20240507-050127.537000        203MiB |
|    0   N/A  N/A     47385      G   ...AAAAAAAACAAAAAAAAAA= --shared-files         36MiB |
+-----------------------------------------------------------------------------------------+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 2483060736 Segmentation fault (core dumped) #3392

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 2483060736 Segmentation fault (core dumped) #3392

jheinrichs79 May 8, 2024

Run Info:

Replies: 2 comments · 3 replies

hydai May 8, 2024 Maintainer

jheinrichs79 May 9, 2024 Author

juntao May 9, 2024 Maintainer

juntao May 8, 2024 Maintainer

jheinrichs79 May 9, 2024 Author

jheinrichs79
May 8, 2024

Replies: 2 comments 3 replies

hydai
May 8, 2024
Maintainer

jheinrichs79 May 9, 2024
Author

juntao May 9, 2024
Maintainer

juntao
May 8, 2024
Maintainer

jheinrichs79 May 9, 2024
Author