Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Autoscaling support in Ray-llm #133

Open
Jeffwan opened this issue Feb 21, 2024 · 0 comments
Open

Autoscaling support in Ray-llm #133

Jeffwan opened this issue Feb 21, 2024 · 0 comments

Comments

@Jeffwan
Copy link

Jeffwan commented Feb 21, 2024

Just curious does ray-llm fully leverage ray serve autoscaling (https://docs.ray.io/en/latest/serve/autoscaling-guide.html)?
Seems ray serve only support target_num_ongoing_requests_per_replica and max_concurrent_queries , As we know, LLM output varies and these are not good for LLM scenarios. how do you achieve better autoscaling support for LLM?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant