Best practices to control the size of the ChatHistory to avoid exceeding a models maximum context length #6155

markwallace-microsoft · 2024-05-08T12:22:54Z

Here's an example of the type of error a developer can run into

Unable to generate bot response. Details: Error: 500: Internal Server ErrorMicrosoft.SemanticKernel.KernelException: StreamResponseToClientAsync failed. ---> Microsoft.SemanticKernel.HttpOperationException: This model's maximum context length is 8192 tokens. However, you requested 8677 tokens (7023 in the messages, 630 in the functions, and 1024 in the completion). Please reduce the length of the messages, functions, or completion. Status: 400 (model_error) ErrorCode: context_length_exceeded Content: { "error": { "message": "This model's maximum context length is 8192 tokens. However, you requested 8677 tokens (7023 in the messages, 630 in the functions, and 1024 in the completion). Please reduce the length of the messages, functions, or completion.", "type": "invalid_request_error", "param": "messages", "code": "context_length_exceeded" } } Headers: Access-Control-Allow-Origin: REDACTED apim-request-id: REDACTED x-ratelimit-remaining-requests: REDACTED...

Some options to mitigate this

Examples which show how trim the chat history dynamically e.g. by setting a maximum number of messages etc.
Examples which show how to summarise context information before it is inserted into a prompt

The text was updated successfully, but these errors were encountered:

stephentoub · 2024-05-08T13:06:08Z

Beyond samples, I think we should have some built-in support for this, e.g. an interface that can be queried for to reduce the size of the chat history and some implementations of it readily available, including ones that trim to a max number of tokens or messages, ones that's summarize and replace the previous history with just the most salient points, ones that's remove less important messages and keep only the important ones, etc. This (and possibly other features) might drive the need for taking a dependency on a tokenizer; we'll want to think that through, in conjunction with the abstraction for a tokenizer in Microsoft.ML.Tokenizers cc: @tarekgh (Tarek, and @ericstj, we should think about whether the Tokenizer abstraction should be moved to an abstractions library... today in order to get the abstraction you also need to pay to get all the implementations).

tarekgh · 2024-05-08T16:06:23Z

today in order to get the abstraction you also need to pay to get all the implementations

In what situation would abstraction be necessary without requiring one of the specific tokenizers? The scenario mentioned doesn't seem to clarify this for me.

stephentoub · 2024-05-10T14:21:52Z

today in order to get the abstraction you also need to pay to get all the implementations

In what situation would abstraction be necessary without requiring one of the specific tokenizers? The scenario mentioned doesn't seem to clarify this for me.

I'll turn around the question and ask.. what's the reason for having the Tokenizer abstraction at all if every use would require a specific tokenizer? :)

Imagine for this issue there were an IChatHistoryReducer with a method like ChatHistory Reduce(ChatHistory history, int tokenLimit, Tokenizer tokenizer), where implementations of IChatHistoryReducer would need to produce a new ChatHistory containing no more than tokenLimit tokens. They'd need to be able to count tokens according to whatever tokenization algorithm was desired, and thus would need to accept a tokenizer.

It's a similar need for something like TextChunker. Today it has methods that take a delegate to do token counting, but all uses of that today just point to a token counting method. It'd be nice if overloads on TextChunker could just take a Tokenizer directly, for example.

tarekgh · 2024-05-10T16:13:36Z

Thanks for the thoughts @stephentoub.

I have some experience with Encoding in the framework. In .NET Core, we attempted to separate most of the concrete encodings (those with significant data) into their own libraries. We retained only the abstraction and a few concrete encodings that we believed would be commonly used, such as UTF-8. However, we found that many users wanted access to the other encodings, leading us to include these concrete encodings by default. I'm asking to gain insight into whether we might encounter similar situations with the Tokenizers, or if we anticipate that many libraries will rely on the abstraction without requiring real concrete implementations.

markwallace-microsoft added the triage label May 8, 2024

markwallace-microsoft added documentation enhancement and removed triage labels May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices to control the size of the ChatHistory to avoid exceeding a models maximum context length #6155

Best practices to control the size of the ChatHistory to avoid exceeding a models maximum context length #6155

markwallace-microsoft commented May 8, 2024

stephentoub commented May 8, 2024

tarekgh commented May 8, 2024

stephentoub commented May 10, 2024 •

edited

tarekgh commented May 10, 2024

Best practices to control the size of the ChatHistory to avoid exceeding a models maximum context length #6155

Best practices to control the size of the ChatHistory to avoid exceeding a models maximum context length #6155

Comments

markwallace-microsoft commented May 8, 2024

stephentoub commented May 8, 2024

tarekgh commented May 8, 2024

stephentoub commented May 10, 2024 • edited

tarekgh commented May 10, 2024

stephentoub commented May 10, 2024 •

edited