Strong need for multiple `models` in a single deployment #263

KastanDay · 2023-09-21T18:19:13Z

As mentioned in #179, users need multiple models. On a multi-GPU on-prem machine, I want to write a config file that's like:

CUDA_VISIBLE_DEVICES=0     MODEL=meta-llama/Llama-2-7b-chat-hf
CUDA_VISIBLE_DEVICES=1,2,3 MODEL=meta-llama/Llama-2-13b-chat-hf

Then users should be able to specify "model": "<either_model>", in their requests.

I can start a PR if you want this feature. Let me know if you have any suggestions on the best way to load these models and keep them mostly separate from each other.

The text was updated successfully, but these errors were encountered:

peakji · 2023-09-22T06:36:38Z

Hi @KastanDay! I would suggest to implement a routing service externally, which can decide which backend service/process to call based on the model parameter. This not only ensures the isolation of model deployment, but also allows for load balancing of the same model replicated across multiple machines.

KastanDay · 2023-09-25T17:48:21Z

Thank you!

Do you have any suggestions on an easy routing system? Something short and sweet? I'm an experienced backend programmer, but I've not done much with load balancing // reverse proxies. Thanks again!

Edit: In particular, I want to respect the model parameter. How can I intercept the request, pull out the model parameter and forward it to the proper local server (where each server has a unique port on localhost).

KastanDay · 2023-09-25T18:02:52Z

Answering my own question, I suppose NGIX or Traefik would work well.

Here's what GPT-4 said, just pretend backend == model parameter.

You can configure Traefik to route requests based on query parameters using its Query rule. Here's a basic example using Docker Compose and Traefik to route HTTP requests based on a query parameter named backend.

Docker Compose File (docker-compose.yml)

Here, we define two backend services (backend1 and backend2) and a Traefik service to act as the router.

version: '3.7'

services:
  traefik:
    image: traefik:v2.4
    ports:
      - "80:80"
    volumes:
      - "./traefik.yml:/etc/traefik/traefik.yml"

  backend1:
    image: nginx:alpine
    labels:
      - "traefik.http.routers.backend1.rule=Host(`localhost`) && Query(`backend=backend1`)"

  backend2:
    image: nginx:alpine
    labels:
      - "traefik.http.routers.backend2.rule=Host(`localhost`) && Query(`backend=backend2`)"

Traefik Configuration File (traefik.yml)

This file sets up Traefik and tells it to look for configurations in Docker labels.

api:
  dashboard: true

providers:
  docker:
    exposedByDefault: false

To bring up the services:

Save the Docker Compose content to docker-compose.yml.
Save the Traefik configuration to traefik.yml.
Run docker-compose up.

Usage

After running the Docker Compose, you can route your requests by including the backend query parameter:

For backend1: http://localhost/?backend=backend1
For backend2: http://localhost/?backend=backend2

Traefik will route the request to the appropriate backend based on the query parameter.

peakji added the enhancement New feature or request label Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strong need for multiple `models` in a single deployment #263

Strong need for multiple `models` in a single deployment #263

KastanDay commented Sep 21, 2023

peakji commented Sep 22, 2023 •

edited

KastanDay commented Sep 25, 2023 •

edited

KastanDay commented Sep 25, 2023

Strong need for multiple models in a single deployment #263

Strong need for multiple models in a single deployment #263

Comments

KastanDay commented Sep 21, 2023

peakji commented Sep 22, 2023 • edited

KastanDay commented Sep 25, 2023 • edited

KastanDay commented Sep 25, 2023

Docker Compose File (docker-compose.yml)

Traefik Configuration File (traefik.yml)

Usage

Strong need for multiple `models` in a single deployment #263

Strong need for multiple `models` in a single deployment #263

peakji commented Sep 22, 2023 •

edited

KastanDay commented Sep 25, 2023 •

edited