We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meta-llama/Meta-Llama-3-70B-Instruct on h100 with latest tgi
gpu are idle, and text-generation-server processes are looping over schedule_yield() calls, 100% cpu usage no gpu usage
and tgi simple healthchecks don't work
available permit leak no "model is overloaded" when gpu are idle and no batch is processed
The text was updated successfully, but these errors were encountered:
No branches or pull requests
System Info
meta-llama/Meta-Llama-3-70B-Instruct on h100 with latest tgi
Information
Tasks
Reproduction
gpu are idle, and text-generation-server processes are looping over schedule_yield() calls, 100% cpu usage no gpu usage
and tgi simple healthchecks don't work
Expected behavior
available permit leak
no "model is overloaded" when gpu are idle and no batch is processed
The text was updated successfully, but these errors were encountered: