Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Running ray-llm 0.5.0 on g4dn.12xlarge instance #150

Open
golemsentience opened this issue Apr 26, 2024 · 2 comments
Open

Running ray-llm 0.5.0 on g4dn.12xlarge instance #150

golemsentience opened this issue Apr 26, 2024 · 2 comments

Comments

@golemsentience
Copy link

golemsentience commented Apr 26, 2024

Has anyone had any success serving llms through the 0.5.0 docker image?

I have followed the following steps:

cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}

docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=/tmp/data -v $cache_dir:/home/user/data anyscale/ray-llm:0.5.0 bash

I have reconfigured the .yaml with

accelerator_type_T4

ray start --head --dashboard-host=0.0.0.0 --num-cpus 48 --num-gpus 4 --resources{"accelerator_type_T4": 4}'

serve run ~/serve_configs/amazon--LightGPT.yaml

It runs, but I get a

"Deployment 'VLLMDeployment: amazon--LightGPT' in application 'ray-llm' has 2 replicas that have taken more than 30s to initialize. This may be caused by a slow init or reconfigure method.

From here, nothing happens. I've let it run for up to a couple of hours, it just seems to hang up here.

Any success working around these issues?

@nkwangleiGIT
Copy link

I'm using vllm as the serving vllm serving
And run inference using Ray serve, here is a sample script:
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py

And just make it as a Ray serve like:

@serve.deployment # (num_replicas=1 ,ray_actor_options={"num_gpus": 1})
@serve.ingress(app)
class VLLMPredictDeployment():
    def __init__(self, **kwargs):

@teocns
Copy link

teocns commented May 1, 2024

What does ray status say?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants