Skip to content

TGI number of maximum total token to be handled by Llama2. How to increase from 2048 to 4096 ? #1421

Closed Answered by ansSanthoshM
ansSanthoshM asked this question in Q&A
Discussion options

You must be logged in to vote

Figured it :) from TGI launcher help page
https://huggingface.co/docs/text-generation-inference/basic_tutorials/launcher

docker exec $model bash -c "text-generation-launcher --model-id /data/$model --max-total-tokens 4096 --max-input-length 3000 --num-shard $num_gpu"

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by OlivierDehaene
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant