Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcoker 部署 vllm 出现 404 Not Found #271

Open
skyliwq opened this issue May 8, 2024 · 11 comments
Open

dcoker 部署 vllm 出现 404 Not Found #271

skyliwq opened this issue May 8, 2024 · 11 comments

Comments

@skyliwq
Copy link

skyliwq commented May 8, 2024

docker 部署 vllm qwen模型 启动成功,调用出现 "POST /v1/chat/completions HTTP/1.1" 404 Not Found 什么原因无法解决,请大神帮帮忙.

http://127.0.0.1:7891/docs 显示 No operations defined in spec!

2024-05-09 07:00:03 INFO: Started server process [1]
2024-05-09 07:00:03 INFO: Waiting for application startup.
2024-05-09 07:00:03 INFO: Application startup complete.
2024-05-09 07:00:03 INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024-05-09 07:00:04 INFO: 172.16.1.1:57384 - "POST /v1/chat/completions HTTP/1.1" 404 Not Found

@Tendo33
Copy link
Contributor

Tendo33 commented May 9, 2024

发一下,请求的脚本

@skyliwq
Copy link
Author

skyliwq commented May 9, 2024

发一下,请求的脚本

我是用psotman 请求的 显示
{
"detail": "Not Found"
}
用官方提供的 tests chat.py 也显示错误
部署用的官方docke-compose 文件 配置都没问题

@Tendo33
Copy link
Contributor

Tendo33 commented May 9, 2024

发一下,请求的脚本

我是用psotman 请求的 显示 { "detail": "Not Found" } 用官方提供的 tests chat.py 也显示错误 部署用的官方docke-compose 文件 配置都没问题

请求的地址有加 /v1 吗?
如果不行可以去部署地址的 /docs 看一下fastapi 接口,可以直接在线起到跟 psotman 一样的效果

@skyliwq
Copy link
Author

skyliwq commented May 9, 2024

Reference i
请求参数 部署参数都对的 核实了很多遍
http://127.0.0.1:7891/docs 显示 No operations defined in spec!

@skyliwq
Copy link
Author

skyliwq commented May 9, 2024

配置为 ENGINE=vllm 报错 ENGINE=default 正常

@xusenlinzy
Copy link
Owner

那应该是vllm安装没有成功

@skyliwq
Copy link
Author

skyliwq commented May 9, 2024

那应该是vllm安装没有成功
直接docker部署的 如何从新安装,大神指点
root@a73600e73869:/workspace# pip show vllm
Name: vllm
Version: 0.4.0
Summary: A high-throughput and memory-efficient inference and serving engine for LLMs
Home-page: https://github.com/vllm-project/vllm
Author: vLLM Team
Author-email:
License: Apache 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: cmake, fastapi, ninja, numpy, outlines, prometheus-client, psutil, py-cpuinfo, pydantic, pynvml, ray, requests, sentencepiece, tiktoken, torch, transformers, triton, uvicorn, xformers

@Tendo33
Copy link
Contributor

Tendo33 commented May 9, 2024

你 docker build 镜像的时候用的哪个docker File ?换成 vllm 那个

@skyliwq
Copy link
Author

skyliwq commented May 9, 2024

vllm

换的是这个

@JadynWong
Copy link

JadynWong commented May 21, 2024

同样的问题, 最新的代码, 使用docker-compose vllm部署, GPU只有embedding模型的占用, 日志也不报错. 请求404

LOG

=============
== PyTorch ==
=============
NVIDIA Release 23.10 (build 71422337)
PyTorch Version 2.1.0a0+32f93b1
Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2023 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
2024-05-21 10:20:09.754 | DEBUG    | api.config:<module>:338 - SETTINGS: {
    "embedding_name": "/models/BAAI/bge-m3",
    "rerank_name": null,
    "embedding_size": -1,
    "embedding_device": "cuda:0",
    "rerank_device": "cuda:0",
    "trust_remote_code": true,
    "tokenize_mode": "slow",
    "tensor_parallel_size": 1,
    "gpu_memory_utilization": 0.9,
    "max_num_batched_tokens": -1,
    "max_num_seqs": 256,
    "quantization_method": null,
    "enforce_eager": false,
    "max_context_len_to_capture": 8192,
    "max_loras": 1,
    "max_lora_rank": 32,
    "lora_extra_vocab_size": 256,
    "lora_dtype": "auto",
    "max_cpu_loras": -1,
    "lora_modules": "",
    "vllm_disable_log_stats": true,
    "model_name": "qwen2",
    "model_path": "/models/Qwen/Qwen1.5-14B-Chat",
    "dtype": "bfloat16",
    "load_in_8bit": false,
    "load_in_4bit": false,
    "context_length": -1,
    "chat_template": "qwen2",
    "rope_scaling": null,
    "flash_attn": false,
    "use_streamer_v2": true,
    "interrupt_requests": true,
    "host": "0.0.0.0",
    "port": 8000,
    "api_prefix": "/v1",
    "engine": "vllm",
    "tasks": [
        "llm",
        "rag"
    ],
    "device_map": "auto",
    "gpus": null,
    "num_gpus": 1,
    "activate_inference": true,
    "model_names": [
        "qwen2",
        "bge-m3"
    ],
    "api_keys": [
        "xxxxxxxxxxxxxxx"
    ]
}
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     172.18.0.1:51554 - "GET /v1/models HTTP/1.1" 404 Not Found
INFO:     172.18.0.1:45768 - "POST /v1/chat/completions HTTP/1.1" 404 Not Found

@JadynWong
Copy link

JadynWong commented May 21, 2024

Clip_2024-05-21_19-28-39

return None

此处加了一行打印异常日志

今天下午才拉取的代码, 重新构建的镜像, 期间没有任何报错

docker build -f docker/Dockerfile.vllm -t llm-api:vllm .

可能相关的问题 vllm-project/vllm#3528

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants