Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Cannot specify models using yaml as a list of dicts or LLMApp objects #109

Open
victorserbu2709 opened this issue Dec 27, 2023 · 0 comments

Comments

@victorserbu2709
Copy link

Hello.
It doesn't work to specify models directly in serve config. But if you define args.models as a list of yaml filepath it works.
Example:

(base) ray@raycluster-llm-head-rv27x:~/serve_configs$ cat meta-llama--Llama-2-7b-chat-hf-full.yaml 
http_options:
  host: 0.0.0.0
applications:
- name: ray-llm
  route_prefix: /
  import_path: rayllm.backend:router_application
  args:
    models:
    - deployment_config:
        autoscaling_config:
          min_replicas: 1
          initial_replicas: 1
          max_replicas: 8
          target_num_ongoing_requests_per_replica: 24
          metrics_interval_s: 10.0
          look_back_period_s: 30.0
          smoothing_factor: 0.5
          downscale_delay_s: 300.0
          upscale_delay_s: 15.0
        max_concurrent_queries: 64
        ray_actor_options:
          num_cpus: 2
      engine_config:
        model_id: meta-llama/Llama-2-7b-chat-hf
        hf_model_id: meta-llama/Llama-2-7b-chat-hf
        type: VLLMEngine
        engine_kwargs:
          trust_remote_code: true
          max_num_batched_tokens: 4096
          max_num_seqs: 64
          gpu_memory_utilization: 0.95
        max_total_tokens: 4096
        generation:
          prompt_format:
            system: "<<SYS>>\n{instruction}\n<</SYS>>\n\n"
            assistant: " {instruction} </s><s>"
            trailing_assistant: ""
            user: "[INST] {system}{instruction} [/INST]"
            system_in_user: true
            default_system_message: ""
          stopping_sequences: ["<unk>"]
      scaling_config:
        num_workers: 1
        num_gpus_per_worker: 1
        num_cpus_per_worker: 2
        placement_strategy: "STRICT_PACK"
        resources_per_worker:
(base) ray@raycluster-llm-head-rv27x:~/serve_configs$ serve run meta-llama--Llama-2-7b-chat-hf-full.yaml 
2023-12-27 13:44:15,621 INFO scripts.py:418 -- Running config file: 'meta-llama--Llama-2-7b-chat-hf-full.yaml'.
2023-12-27 13:44:15,626 INFO worker.py:1458 -- Connecting to existing Ray cluster at address: 10.233.86.89:6379...
2023-12-27 13:44:15,635 INFO worker.py:1633 -- Connected to Ray cluster. View the dashboard at 10.233.86.89:8265 
(ServeController pid=4989) INFO 2023-12-27 13:44:16,554 controller 4989 application_state.py:183 - Recovering target state for application 'ray-llm' from checkpoint.
(HTTPProxyActor pid=5029) INFO 2023-12-27 13:44:17,438 http_proxy 10.233.86.89 http_proxy.py:1433 - Proxy actor 3c92662700463eea872f1cfd16000000 starting on node d9722feee6b275abc23643cae31fe546b1dd6ba3187a250e67318a8c.
(HTTPProxyActor pid=5029) INFO 2023-12-27 13:44:17,445 http_proxy 10.233.86.89 http_proxy.py:1617 - Starting HTTP server on node: d9722feee6b275abc23643cae31fe546b1dd6ba3187a250e67318a8c listening on port 8000
2023-12-27 13:44:17,475 SUCC scripts.py:514 -- Submitted deploy config successfully.
(ServeController pid=4989) INFO 2023-12-27 13:44:17,472 controller 4989 application_state.py:374 - Starting build_serve_application task for application 'ray-llm'.
(HTTPProxyActor pid=5029) INFO:     Started server process [5029]
(build_serve_application pid=5062) [WARNING 2023-12-27 13:44:21,054] api.py: 382  DeprecationWarning: `route_prefix` in `@serve.deployment` has been deprecated. To specify a route prefix for an application, pass it into `serve.run` instead.
(ServeController pid=4989) WARNING 2023-12-27 13:44:21,159 controller 4989 application_state.py:663 - Deploying app 'ray-llm' failed with exception:
(ServeController pid=4989) Traceback (most recent call last):
(ServeController pid=4989)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/application_state.py", line 909, in build_serve_application
(ServeController pid=4989)     app = call_app_builder_with_args_if_necessary(import_attr(import_path), args)
(ServeController pid=4989)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/api.py", line 377, in call_app_builder_with_args_if_necessary
(ServeController pid=4989)     app = builder(args)
(ServeController pid=4989)   File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/server/run.py", line 114, in router_application
(ServeController pid=4989)     router_args = RouterArgs.parse_obj(args)
(ServeController pid=4989)   File "pydantic/main.py", line 526, in pydantic.main.BaseModel.parse_obj
(ServeController pid=4989)   File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
(ServeController pid=4989) pydantic.error_wrappers.ValidationError: 7 validation errors for RouterArgs
(ServeController pid=4989) models
(ServeController pid=4989)   str type expected (type=type_error.str)
(ServeController pid=4989) models
(ServeController pid=4989)   value is not a valid dict (type=type_error.dict)
(ServeController pid=4989) models -> 0
(ServeController pid=4989)   str type expected (type=type_error.str)
(ServeController pid=4989) models -> 0 -> engine_config -> engine_kwargs
(ServeController pid=4989)   extra fields not permitted (type=value_error.extra)
(ServeController pid=4989) models -> 0 -> engine_config -> generation
(ServeController pid=4989)   extra fields not permitted (type=value_error.extra)
(ServeController pid=4989) models -> 0 -> engine_config -> hf_model_id
(ServeController pid=4989)   extra fields not permitted (type=value_error.extra)
(ServeController pid=4989) models -> 0 -> engine_config -> max_total_tokens
(ServeController pid=4989)   extra fields not permitted (type=value_error.extra)
(ServeController pid=4989) 
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant