Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

RAY-LLM stuck at replica step #143

Open
NBTrong opened this issue Mar 24, 2024 · 1 comment
Open

RAY-LLM stuck at replica step #143

NBTrong opened this issue Mar 24, 2024 · 1 comment

Comments

@NBTrong
Copy link

NBTrong commented Mar 24, 2024

Hi,

I'm trying to run a rayllm as the tutorial in README.

But now my serving seemed to stuck at replica. It looked like this:

image

The warning message:
{"levelname": "WARNING", "asctime": "2024-03-24 03:17:59,398", "component_name": "controller", "component_id": "1684", "message": "deployment_state.py:2152 - Deployment 'VLLMDeployment:amazon--LightGPT' in application 'ray-llm' 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: [{"CPU": 1.0, "accelerator_type_a10": 0.01}, {"CPU": 1.0, "accelerator_type_a10": 0.01, "GPU": 0.1}], total resources available: {}. Use ray status for more details."}

Can you please help me with this. Should I wait longer or is there any configuration I have missed?

Thank you.

@nkwangleiGIT
Copy link

same issue here

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants