Running ray-llm 0.5.0 on g4dn.12xlarge instance #150

golemsentience · 2024-04-26T14:05:00Z

Has anyone had any success serving llms through the 0.5.0 docker image?

I have followed the following steps:

cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}

docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=/tmp/data -v $cache_dir:/home/user/data anyscale/ray-llm:0.5.0 bash

I have reconfigured the .yaml with

accelerator_type_T4

ray start --head --dashboard-host=0.0.0.0 --num-cpus 48 --num-gpus 4 --resources{"accelerator_type_T4": 4}'

serve run ~/serve_configs/amazon--LightGPT.yaml

It runs, but I get a

"Deployment 'VLLMDeployment: amazon--LightGPT' in application 'ray-llm' has 2 replicas that have taken more than 30s to initialize. This may be caused by a slow init or reconfigure method.

From here, nothing happens. I've let it run for up to a couple of hours, it just seems to hang up here.

Any success working around these issues?

The text was updated successfully, but these errors were encountered:

nkwangleiGIT · 2024-04-28T01:51:31Z

I'm using vllm as the serving vllm serving
And run inference using Ray serve, here is a sample script:
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py

And just make it as a Ray serve like:

@serve.deployment # (num_replicas=1 ,ray_actor_options={"num_gpus": 1})
@serve.ingress(app)
class VLLMPredictDeployment():
    def __init__(self, **kwargs):

teocns · 2024-05-01T08:39:12Z

What does ray status say?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running ray-llm 0.5.0 on g4dn.12xlarge instance #150

Running ray-llm 0.5.0 on g4dn.12xlarge instance #150

golemsentience commented Apr 26, 2024 •

edited

nkwangleiGIT commented Apr 28, 2024

teocns commented May 1, 2024

Running ray-llm 0.5.0 on g4dn.12xlarge instance #150

Running ray-llm 0.5.0 on g4dn.12xlarge instance #150

Comments

golemsentience commented Apr 26, 2024 • edited

nkwangleiGIT commented Apr 28, 2024

teocns commented May 1, 2024

golemsentience commented Apr 26, 2024 •

edited