Debian 12 x LLamaSharp 0.11.2 Crashed Silently #668

kuan2019 · 2024-04-15T13:41:46Z

HI,
I was running a console application with LLamaSharp 0.11.2 under Debian 12 then it been crashed silently without any exceptions when it was loading the file.

using var model = LLamaWeights.LoadFromFile(parameters);

How can I fix this issue? The information of environment is as below,

OS: Debian GNU/Linux 12 (bookworm)
CPU: Intel x64
Memory: 264GB
GLIBC Version: Debian GLIBC 2.36-9+deb12u4
dotnet 7.0.408
LLamaSharp 0.11.2 & LLamaSharp.Backend.Cpu 0.11.2

best regards,

The text was updated successfully, but these errors were encountered:

SignalRT · 2024-04-15T20:28:09Z

Could you share the link to the model that you are trying to load to make a test?

kuan2019 · 2024-04-16T00:21:12Z

I run it with model "llama-2-7b-chat.Q4_K_M.gguf" on the server but it's good on my M1 MacBook Pro (MacOS: Sonoma 14.4). No idea why it was terminated silently.
Correction:
The process was running and stopped here,
var parameters = new ModelParams(modelPath)

kuan2019 · 2024-04-16T06:33:42Z

After investigated to llama.cpp, I got why it's occurred core dump! How can I do next step?

#./main -ngl 32 -m /user/models/llama-2-7b-chat.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "YOUR PROMPT..."
warning: not compiled with GPU offload support, --n-gpu-layers option will be ignored
warning: see main README.md for information on enabling GPU BLAS support
Log start
main: build = 2679 (7593639c)
main: built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
main: seed = 1713247730
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /root/models/llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.context_length u32 = 4096
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: general.file_type u32 = 15
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "~~", "~~", "<0x00>", "<...
llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 18: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 4096
llm_load_print_meta: n_embd_v_gqa = 4096
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 6.74 B
llm_load_print_meta: model size = 3.80 GiB (4.84 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: CPU buffer size = 3891.24 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB
llama_new_context_with_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
llama_new_context_with_model: CPU compute buffer size = 296.01 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
Segmentation fault

SignalRT · 2024-04-16T12:29:53Z

if the problem happens with llama.cpp examples (main) you should open the issue to llama.cpp.

kuan2019 · 2024-04-24T06:52:31Z

@martindevans After I refreshed to newest llama.cpp and recompiled these projects, then I replaced with two files LLamaSharp.dll and libllama.so to my dotnet project under Debian 12, it's workable, so amazing!

AsakusaRinne · 2024-04-24T08:53:55Z

@kuan2019 The binary in master branch was updated last week. Could you please try once more with the current master branch?

martindevans added the Upstream Tracking an issue in llama.cpp label Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debian 12 x LLamaSharp 0.11.2 Crashed Silently #668

Debian 12 x LLamaSharp 0.11.2 Crashed Silently #668

kuan2019 commented Apr 15, 2024 •

edited

SignalRT commented Apr 15, 2024

kuan2019 commented Apr 16, 2024 •

edited

kuan2019 commented Apr 16, 2024

SignalRT commented Apr 16, 2024

kuan2019 commented Apr 24, 2024 •

edited

AsakusaRinne commented Apr 24, 2024

Debian 12 x LLamaSharp 0.11.2 Crashed Silently #668

Debian 12 x LLamaSharp 0.11.2 Crashed Silently #668

Comments

kuan2019 commented Apr 15, 2024 • edited

SignalRT commented Apr 15, 2024

kuan2019 commented Apr 16, 2024 • edited

kuan2019 commented Apr 16, 2024

SignalRT commented Apr 16, 2024

kuan2019 commented Apr 24, 2024 • edited

AsakusaRinne commented Apr 24, 2024

kuan2019 commented Apr 15, 2024 •

edited

kuan2019 commented Apr 16, 2024 •

edited

kuan2019 commented Apr 24, 2024 •

edited