You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to test an example: the initial kv cache length is 2048, and LLM iterate 2048 times, so the output_tokens=2048, but the initial kv cache length is 2048, and the final kv cache length is 4096(2048+2048).
I want to test an example: the initial kv cache length is 2048, and LLM iterate 2048 times, so the output_tokens=2048, but the initial kv cache length is 2048, and the final kv cache length is 4096(2048+2048).
if I run:
the initial kv cache length is 1, not 2048.
So, how to set the initial kv cache length?
The text was updated successfully, but these errors were encountered: