You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WARNING: gguf quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO: Initializing the Aphrodite Engine (v0.5.1) with the following config:
INFO: Model = '/home/tesh/models/kunoichi-7b.Q4_K_M.gguf'
INFO: DataType = torch.float16
INFO: Model Load Format = auto
INFO: Number of GPUs = 1
INFO: Disable Custom All-Reduce = False
INFO: Quantization Format = gguf
INFO: Context Length = 8192
INFO: Enforce Eager Mode = False
INFO: KV Cache Data Type = auto
INFO: KV Cache Params Path = None
INFO: Device = cuda
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Converting GGUF tensors to PyTorch... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 291/291 0:00:00
INFO: Model weights loaded. Memory usage: 4.12 GiB x 1 = 4.12 GiB
client_loop: send disconnect: Connection reset
Good fucking question as to why it crashes
but it cuts all ssh and ends the program.
The text was updated successfully, but these errors were encountered:
DuckY-Y
changed the title
[Crash]:
[Crash]: Program gets terminated
Apr 11, 2024
Anything you want to discuss about Aphrodite.
Command:
Output:
Good fucking question as to why it crashes
but it cuts all ssh and ends the program.
The text was updated successfully, but these errors were encountered: