Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices to control the size of the ChatHistory to avoid exceeding a models maximum context length #6155

Open
markwallace-microsoft opened this issue May 8, 2024 · 4 comments

Comments

@markwallace-microsoft
Copy link
Member

Here's an example of the type of error a developer can run into

Unable to generate bot response. Details: Error: 500: Internal Server ErrorMicrosoft.SemanticKernel.KernelException: StreamResponseToClientAsync failed. ---> Microsoft.SemanticKernel.HttpOperationException: This model's maximum context length is 8192 tokens. However, you requested 8677 tokens (7023 in the messages, 630 in the functions, and 1024 in the completion). Please reduce the length of the messages, functions, or completion. Status: 400 (model_error) ErrorCode: context_length_exceeded Content: { "error": { "message": "This model's maximum context length is 8192 tokens. However, you requested 8677 tokens (7023 in the messages, 630 in the functions, and 1024 in the completion). Please reduce the length of the messages, functions, or completion.", "type": "invalid_request_error", "param": "messages", "code": "context_length_exceeded" } } Headers: Access-Control-Allow-Origin: REDACTED apim-request-id: REDACTED x-ratelimit-remaining-requests: REDACTED...

Some options to mitigate this

  1. Examples which show how trim the chat history dynamically e.g. by setting a maximum number of messages etc.
  2. Examples which show how to summarise context information before it is inserted into a prompt
@stephentoub
Copy link
Member

Beyond samples, I think we should have some built-in support for this, e.g. an interface that can be queried for to reduce the size of the chat history and some implementations of it readily available, including ones that trim to a max number of tokens or messages, ones that's summarize and replace the previous history with just the most salient points, ones that's remove less important messages and keep only the important ones, etc. This (and possibly other features) might drive the need for taking a dependency on a tokenizer; we'll want to think that through, in conjunction with the abstraction for a tokenizer in Microsoft.ML.Tokenizers cc: @tarekgh (Tarek, and @ericstj, we should think about whether the Tokenizer abstraction should be moved to an abstractions library... today in order to get the abstraction you also need to pay to get all the implementations).

@tarekgh
Copy link
Member

tarekgh commented May 8, 2024

today in order to get the abstraction you also need to pay to get all the implementations

In what situation would abstraction be necessary without requiring one of the specific tokenizers? The scenario mentioned doesn't seem to clarify this for me.

@stephentoub
Copy link
Member

stephentoub commented May 10, 2024

today in order to get the abstraction you also need to pay to get all the implementations

In what situation would abstraction be necessary without requiring one of the specific tokenizers? The scenario mentioned doesn't seem to clarify this for me.

I'll turn around the question and ask.. what's the reason for having the Tokenizer abstraction at all if every use would require a specific tokenizer? :)

Imagine for this issue there were an IChatHistoryReducer with a method like ChatHistory Reduce(ChatHistory history, int tokenLimit, Tokenizer tokenizer), where implementations of IChatHistoryReducer would need to produce a new ChatHistory containing no more than tokenLimit tokens. They'd need to be able to count tokens according to whatever tokenization algorithm was desired, and thus would need to accept a tokenizer.

It's a similar need for something like TextChunker. Today it has methods that take a delegate to do token counting, but all uses of that today just point to a token counting method. It'd be nice if overloads on TextChunker could just take a Tokenizer directly, for example.

@tarekgh
Copy link
Member

tarekgh commented May 10, 2024

Thanks for the thoughts @stephentoub.

I have some experience with Encoding in the framework. In .NET Core, we attempted to separate most of the concrete encodings (those with significant data) into their own libraries. We retained only the abstraction and a few concrete encodings that we believed would be commonly used, such as UTF-8. However, we found that many users wanted access to the other encodings, leading us to include these concrete encodings by default. I'm asking to gain insight into whether we might encounter similar situations with the Tokenizers, or if we anticipate that many libraries will rely on the abstraction without requiring real concrete implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants