Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

RayWorkerVllm Actor Dies After ~1h: The actor is dead because all references to the actor were removed. #140

Open
cshyjak opened this issue Mar 7, 2024 · 0 comments

Comments

@cshyjak
Copy link

cshyjak commented Mar 7, 2024

Testing out RayLLM and having issues where the model loads and runs fine initially but starts throwing errors after ~1hr of being running. This happens on multiple types of models. Example shown below using the model config from this repo.

RayService Configuration:

apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
  name: laivly-ml
  namespace: sidd-platform
spec:
  serviceUnhealthySecondThreshold: 1200 # Config for the health check threshold for service. Default value is 60.
  deploymentUnhealthySecondThreshold: 1200 # Config for the health check threshold for deployments. Default value is 60.
  serveConfigV2: |
      applications:
      - name: router
        import_path: rayllm.backend:router_application
        route_prefix: /llm
        args:
          models:            
            - ./models/continuous_batching/quantization/TheBloke--Llama-2-7B-chat-AWQ.yaml
  rayClusterConfig:
    headGroupSpec:
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 2}"'
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image: anyscale/ray-llm:0.5.0
            resources:
              limits:
                cpu: 2
                memory: 8Gi
              requests:
                cpu: 2
                memory: 4Gi
            ports:
            - containerPort: 6379
              name: gcs-server
            - containerPort: 8265 # Ray dashboard
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
          nodeSelector:
            kubernetes.io/arch: amd64
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 0
      maxReplicas: 4
      groupName: a10-gpu
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 46, \"accelerator_type_a10\": 4}"'
      template:
        spec:
          containers:
          - name: llm
            image: anyscale/ray-llm:0.5.0
            lifecycle:
              preStop:
                exec:
                  command: ["/bin/sh","-c","ray stop"]
            resources:
              limits:
                cpu: "46"
                memory: "190G"
                nvidia.com/gpu: 4
              requests:
                cpu: "2"
                memory: "4G"
                nvidia.com/gpu: 4
            ports:
            - containerPort: 8000
              name: serve
          nodeSelector:
            karpenter.k8s.aws/instance-family:	g5
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant