Don't provide output token limitations / remove the default 2048 max tokens / allow to move it up to 100k #478

syberkitten · 2024-03-31T15:50:03Z

Why

There are models like gemini (just one example) that i cannot increase the token output beyond 2048 even tho
the model allows much more. so this in itself makes BigAgi not really usable with this model if I need more output.
From my experience with models it's better not to cap output unless you need it, so capping should be not used
unless user requests it. I believe this is what occurs with most chats either on anthropic / openai / others, there is no default cap of output tokens, at any rate i always forget to change it to the maximum and as i mentioned maximum does not correlated, so you just put it on 2048 because it "covers" most of models, this makes sense, but then becomes a limitation.
I rarely use model caps (max_output) mostly i want it to work with the maximum that it has, and i believe most of users would also.
also why not allow to drag it to infinity? If you are not able to have the correct limitation numbers for each model, at least don't block it's usage.

Description

Make a checkbox for models to "remove" max response (don't provide this var when making the LLM call), so you'll get what ever output comes out without any restriction, and this is what i would expect from a model. #531

by default don't cap models to 2048 - uncap.
But if you need to cap, at least allow to move it further (currently I cannot move gemini more then 2048, rediculous)

enricoros · 2024-04-14T22:59:57Z

The max Output Tokens for Gemini models are 8192. See here: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini

When selecting Gemini models, Big-AGI lets you change the output tokens up to the full range. The values we get are directly out of the Gemini models list API, and exceeding those values will return an error.

Just tested that for Gemini 1.5 Pro, I can select from 256 to 8192 (the full max) supported.

In your case, just update the models, go to the model options (wheel icon) and drag the token window to the max.

enricoros · 2024-04-14T23:15:54Z

I see the usefulness of this feature, however the real issue is that some APIs do not report the context window of the models, or the maximum output tokens.

Anthropic, OpenaAI, Google and OpenRouter either have the values in the APIs (preferred) or the web page.

I'll take a look at implementing but there are 2 issues with not selecting the output size:

the llm will limit the input size (assuming max Output)
the UI won't be able to show output indicators

myobfee · 2024-05-06T11:49:20Z

This is the primary reason why I'm currently exploring other LLM front-ends. 2048 max is much too little for many models and significantly lowers the usability of many when you've run out of tokens and forced to compress or restart. Great for single prompt response, not so much for deeper analysis and contextual dialog. Please enable to at least push to 8192 or even 32768.

enricoros · 2024-05-07T11:57:36Z

See #531

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't provide output token limitations / remove the default 2048 max tokens / allow to move it up to 100k #478

Don't provide output token limitations / remove the default 2048 max tokens / allow to move it up to 100k #478

syberkitten commented Mar 31, 2024 •

edited by enricoros

enricoros commented Apr 14, 2024

enricoros commented Apr 14, 2024

myobfee commented May 6, 2024

enricoros commented May 7, 2024

Don't provide output token limitations / remove the default 2048 max tokens / allow to move it up to 100k #478

Don't provide output token limitations / remove the default 2048 max tokens / allow to move it up to 100k #478

Comments

syberkitten commented Mar 31, 2024 • edited by enricoros

enricoros commented Apr 14, 2024

enricoros commented Apr 14, 2024

myobfee commented May 6, 2024

enricoros commented May 7, 2024

syberkitten commented Mar 31, 2024 •

edited by enricoros