You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-05-06T08:14:37.322566Z ERROR text_generation_launcher: Shard 1 failed to start
2024-05-06T08:14:37.322594Z INFO text_generation_launcher: Shutting down shards
2024-05-06T08:14:37.324258Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in<module>sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 71, in serve
from text_generation_server import server
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 17, in<module>
from text_generation_server.models.vlm_causal_lm import VlmCausalLMBatch
Error: ShardCannotStart
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 14, in<module>
from text_generation_server.models.flash_mistral import (
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 18, in<module>
from text_generation_server.models.custom_modeling.flash_mistral_modeling import (
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 29, in<module>
from text_generation_server.utils import paged_attention, flash_attn
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/flash_attn.py", line 24, in<module>
raise ImportError("CUDA is not available")
ImportError: CUDA is not available
rank=0
Expected behavior
When deploying with the tgi-2.0.1 image, there are no issues, but this version does not support llama3-instruct very well. I'm not sure why the "CUDA is not available" issue occurs when deploying tgi-2.0.2. Is it because the local CUDA version (12.1) is too low?
The text was updated successfully, but these errors were encountered:
System Info
Information
Tasks
Reproduction
tgi version
ghcr.io/huggingface/text-generation-inference:sha-bb2b295-rocm
docker command
Error Message
Expected behavior
When deploying with the tgi-2.0.1 image, there are no issues, but this version does not support llama3-instruct very well. I'm not sure why the "CUDA is not available" issue occurs when deploying tgi-2.0.2. Is it because the local CUDA version (12.1) is too low?
The text was updated successfully, but these errors were encountered: