You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your enhancement related to a problem? Please describe
Trying to run against TheBloke/Mistral-7B-Instruct-v0.2-GGUF I was receiving messages:
Error code: 400 - {'error': {'message': "This model's maximum context length is 2048 tokens. However, you requested 2981 tokens (2981 in the messages, None in the completion). Please reduce the length of the messages or completion.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
Is your enhancement related to a problem? Please describe
Trying to run against
TheBloke/Mistral-7B-Instruct-v0.2-GGUF
I was receiving messages:Describe the solution you'd like
llama_cpp.server has an option to
--n_ctx
to adjust the context size:https://llama-cpp-python.readthedocs.io/en/latest/server/#server-options
By running a custom image with this added I was able to run my queries without receiving this message.
It would probably be pretty easy to pass it as an env var like as is done for HOST, PORT, etc.
Describe alternatives you've considered
No response
Additional context
I was trying to run the https://github.com/konveyor-ecosystem/kai/ demo against Podman AI Lab when I encountered these errors.
The text was updated successfully, but these errors were encountered: