LLamaSharp 0.11.2 Exception #660

kuan2019 · 2024-04-11T00:53:39Z

Hi, When I am running the LLamaSharp 0.11.2 with DotNet 7, I got a exception suddenly as below how can I fixed it?

Unhandled exception. LLama.Exceptions.LLamaDecodeError: llama_decode failed: 'NoKvSlot'
   at LLama.InteractiveExecutor.InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
   at LLama.StatefulExecutorBase.InferAsync(String text, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.StatefulExecutorBase.InferAsync(String text, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at LLama.ChatSession.ChatAsyncInternal(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsyncInternal(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsyncInternal(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at LLama.ChatSession.ChatAsync(Message message, Boolean applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsync(Message message, Boolean applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsync(Message message, Boolean applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Program.<Main>$(String[] args) in /Users/eric.hu/Projects/Net7.Llamasharp.Console/Net7.Llamasharp.Console/Program.cs:line 36
   at Program.<Main>$(String[] args) in /Users/eric.hu/Projects/Net7.Llamasharp.Console/Net7.Llamasharp.Console/Program.cs:line 36
   at Program.<Main>(String[] args)

The text was updated successfully, but these errors were encountered:

martindevans · 2024-04-11T13:42:48Z

The fundamental error: llama_decode failed: 'NoKvSlot' means that the inference system basically ran out of memory and can't store any more tokens. You'll need to have a bigger KV cache, or work with less tokens.

kuan2019 · 2024-04-11T14:57:57Z

@martindevans Got it! thanks.

Hg0613 · 2024-04-12T22:37:27Z

I'm currently having the exact same problem after upgrading from version 10 to 11.2. I did not change the model parameters. I tried reducing the number of tokens as advised, but the error remained.

martindevans · 2024-04-12T22:49:51Z

That's odd. The NoKvSlot error is pretty much passed straight through from llama.cpp, there's not a lot going on on the C# side that could be a problem there.

Can you tell us some more details - what are your loading parameters? What model are you using? How much text are you evaluating etc?

Edit: Also, is this dotnet7.0 only? Or are you using a different version?

Hg0613 · 2024-04-13T08:17:12Z

Model: open-chat-3.5-0106

Model Params:
ContextSize = 1024, Seed = 1337, Threads = (uint)Math.Max(Environment.ProcessorCount / 2, 1), UseMemorymap = true, UseMemoryLock = true, BatchSize = 512, Encoding = Encoding.UTF8, EmbeddingMode = true, GpuLayerCount = 28,

Inference Params:
MaxTokens = 512, AntiPrompts = new List<string> { "User:" }, Temperature = 0.7f, RepeatPenalty = 1.0f, TopK = 50, TopP = 0.95f,

With these parameters everything worked fine in previous versions. Use .net 8.0 :)

martindevans · 2024-04-13T14:09:02Z

Those settings look fine :/

As far as I'm aware the only way this should raise NoKvSlot is if you try to use more than 1024 tokens, or if the cache is very fragmented (unlikely, unless you're doing weird things directly with the kv cache).

I assume you're getting this well short of 1024 tokens?

Hg0613 · 2024-04-14T09:20:26Z

I tried increasing the ContextSize parameter to 2048, and now there is no error during long-term communication. But now, of course, it takes longer to start than usual, hehe.

Hg0613 · 2024-04-14T09:24:02Z

I tried to change MaxTokens, but the problem was not solved.

martindevans · 2024-04-14T14:44:04Z

MaxTokens is a limit on how many tokens to generate.

It might help if you're careful with your numbers.

For example:

with a context size of 1024 you prompt with 512 tokens and then set MaxTokens to 512.
That could generate 200 tokens.
Now you set MaxTokens to 312. etc

If expanding the context size fixes it, it sounds to me like you're simply using up all your available token space.

AsakusaRinne · 2024-05-12T12:08:23Z

This issue is expected to be fixed in the current master branch. Could you please try again with the master branch?

Hg0613 · 2024-05-13T13:23:33Z

In version 0.12.0 problem resolved itself :|

AsakusaRinne · 2024-05-13T23:51:03Z

Thank you for your feedback, closing this issue as completed now. Please feel free to comment here if the problem reappears.

martindevans added the bug Something isn't working label Apr 12, 2024

AsakusaRinne closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLamaSharp 0.11.2 Exception #660

LLamaSharp 0.11.2 Exception #660

kuan2019 commented Apr 11, 2024 •

edited by martindevans

martindevans commented Apr 11, 2024

kuan2019 commented Apr 11, 2024

Hg0613 commented Apr 12, 2024

martindevans commented Apr 12, 2024 •

edited

Hg0613 commented Apr 13, 2024

martindevans commented Apr 13, 2024

Hg0613 commented Apr 14, 2024 •

edited

Hg0613 commented Apr 14, 2024

martindevans commented Apr 14, 2024

AsakusaRinne commented May 12, 2024

Hg0613 commented May 13, 2024

AsakusaRinne commented May 13, 2024

LLamaSharp 0.11.2 Exception #660

LLamaSharp 0.11.2 Exception #660

Comments

kuan2019 commented Apr 11, 2024 • edited by martindevans

martindevans commented Apr 11, 2024

kuan2019 commented Apr 11, 2024

Hg0613 commented Apr 12, 2024

martindevans commented Apr 12, 2024 • edited

Hg0613 commented Apr 13, 2024

martindevans commented Apr 13, 2024

Hg0613 commented Apr 14, 2024 • edited

Hg0613 commented Apr 14, 2024

martindevans commented Apr 14, 2024

AsakusaRinne commented May 12, 2024

Hg0613 commented May 13, 2024

AsakusaRinne commented May 13, 2024

kuan2019 commented Apr 11, 2024 •

edited by martindevans

martindevans commented Apr 12, 2024 •

edited

Hg0613 commented Apr 14, 2024 •

edited