Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

How to use partial GPU? #117

Open
rifkybujana opened this issue Jan 10, 2024 · 2 comments
Open

How to use partial GPU? #117

rifkybujana opened this issue Jan 10, 2024 · 2 comments

Comments

@rifkybujana
Copy link

Hi, I wonder if it's possible to use a partial portion of GPU per model instead of using 1 GPU for each model deployed? As an example, when using a G5.12xlarge instance in AWS with 4 GPUs, instead of deploying on a maximum of 4 models, by using half of the GPU, it might able to deploy eight models with quantization. Changing the num gpus per worker resulted in error.

@sihanwang41
Copy link
Collaborator

Hi @rifkybujana , what if you change num_workers to 2, and keep num gpu per worker as it is.

@lizzzcai
Copy link

lizzzcai commented Feb 8, 2024

Any update on this? I am doing a similar test and want to know what is the best practice for deploying 8 models in a 4 GPUs instance. what is the different between 1 worker, 4 replicas and 4 workers with 1 replica each? Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants