Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vllm worker does not release semaphore #3328

Open
jbding opened this issue May 13, 2024 · 0 comments · May be fixed by #3330
Open

Vllm worker does not release semaphore #3328

jbding opened this issue May 13, 2024 · 0 comments · May be fixed by #3330

Comments

@jbding
Copy link

jbding commented May 13, 2024

When I used vllm_worker to deploy the vicuna model, the --limit-worker-concurrency settings was 3. After running for a while, I found that the model could not work. From the log, I found that the semaphore was not released (after three times, the semaphore value was 0)

Here is the log

2024-05-10 05:56:32 | ERROR | stderr | ERROR: Exception in ASGI application
2024-05-10 05:56:32 | ERROR | stderr | Traceback (most recent call last):
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call
2024-05-10 05:56:32 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive))
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:56:32 | ERROR | stderr | await func()
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
2024-05-10 05:56:32 | ERROR | stderr | message = await receive()
2024-05-10 05:56:32 | ERROR | stderr | ^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
2024-05-10 05:56:32 | ERROR | stderr | await self.message_event.wait()
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait
2024-05-10 05:56:32 | ERROR | stderr | await fut
2024-05-10 05:56:32 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe55b141750
2024-05-10 05:56:32 | ERROR | stderr |
2024-05-10 05:56:32 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-05-10 05:56:32 | ERROR | stderr |
2024-05-10 05:56:32 | ERROR | stderr | + Exception Group Traceback (most recent call last):
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
2024-05-10 05:56:32 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value]
2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
2024-05-10 05:56:32 | ERROR | stderr | | return await self.app(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
2024-05-10 05:56:32 | ERROR | stderr | | await super().call(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
2024-05-10 05:56:32 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
2024-05-10 05:56:32 | ERROR | stderr | | raise exc
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
2024-05-10 05:56:32 | ERROR | stderr | | await self.app(scope, receive, _send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
2024-05-10 05:56:32 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | raise exc
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
2024-05-10 05:56:32 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
2024-05-10 05:56:32 | ERROR | stderr | | await route.handle(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
2024-05-10 05:56:32 | ERROR | stderr | | await self.app(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
2024-05-10 05:56:32 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | raise exc
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
2024-05-10 05:56:32 | ERROR | stderr | | await response(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call
2024-05-10 05:56:32 | ERROR | stderr | | async with anyio.create_task_group() as task_group:
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
2024-05-10 05:56:32 | ERROR | stderr | | raise BaseExceptionGroup(
2024-05-10 05:56:32 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2024-05-10 05:56:32 | ERROR | stderr | +-+---------------- 1 ----------------
2024-05-10 05:56:32 | ERROR | stderr | | Traceback (most recent call last):
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:56:32 | ERROR | stderr | | await func()
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
2024-05-10 05:56:32 | ERROR | stderr | | async for chunk in self.body_iterator:
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream
2024-05-10 05:56:32 | ERROR | stderr | | sampling_params = SamplingParams(
2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init
2024-05-10 05:56:32 | ERROR | stderr | | self._verify_args()
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args
2024-05-10 05:56:32 | ERROR | stderr | | raise ValueError(
2024-05-10 05:56:32 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -862.
2024-05-10 05:56:32 | ERROR | stderr | +------------------------------------
2024-05-10 05:56:43 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=2, locked=False). call_ct: 1220. worker_id: 42c39e7a.
2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46612 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46614 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 05:56:47 async_llm_engine.py:371] Received request 18dbe2d2c72c4cfe9fea1922bd4e8b84: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:56:47 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:56:47 async_llm_engine.py:111] Finished request 18dbe2d2c72c4cfe9fea1922bd4e8b84.
INFO 05-10 05:56:47 async_llm_engine.py:134] Aborted request 18dbe2d2c72c4cfe9fea1922bd4e8b84.
2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46616 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 05:56:48 | INFO | stdout | INFO: 127.0.0.1:46708 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:56:48 | INFO | stdout | INFO: 127.0.0.1:46710 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 05:56:48 async_llm_engine.py:371] Received request 594e5b0d350c4a5b8401814198fc447e: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:56:49 async_llm_engine.py:111] Finished request 594e5b0d350c4a5b8401814198fc447e.
INFO 05-10 05:56:49 async_llm_engine.py:134] Aborted request 594e5b0d350c4a5b8401814198fc447e.
2024-05-10 05:56:49 | INFO | stdout | INFO: 127.0.0.1:46712 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 05:56:54 | INFO | stdout | INFO: 127.0.0.1:46930 - "POST /worker_generate_stream HTTP/1.1" 200 OK
INFO 05-10 05:56:54 async_llm_engine.py:371] Received request 9b3295689edf474ba87e1ff73acf28a4: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: 你好 ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.7, top_p=1.0, top_k=-1.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:56:55 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:56:56 async_llm_engine.py:111] Finished request 9b3295689edf474ba87e1ff73acf28a4.
INFO 05-10 05:56:56 async_llm_engine.py:134] Aborted request 9b3295689edf474ba87e1ff73acf28a4.
2024-05-10 05:57:26 | INFO | stdout | INFO: 127.0.0.1:47946 - "POST /worker_generate_stream HTTP/1.1" 200 OK
INFO 05-10 05:57:26 async_llm_engine.py:371] Received request f274e61e2013461f9ff211e905272eb7: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: 你好 ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.7, top_p=1.0, top_k=-1.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:57:26 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:57:27 async_llm_engine.py:111] Finished request f274e61e2013461f9ff211e905272eb7.
INFO 05-10 05:57:27 async_llm_engine.py:134] Aborted request f274e61e2013461f9ff211e905272eb7.
2024-05-10 05:57:28 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=2, locked=False). call_ct: 1224. worker_id: 42c39e7a.
2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49372 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49374 - "POST /count_token HTTP/1.1" 200 OK
2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49378 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-05-10 05:58:10 | ERROR | stderr | ERROR: Exception in ASGI application
2024-05-10 05:58:10 | ERROR | stderr | Traceback (most recent call last):
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call
2024-05-10 05:58:10 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive))
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:58:10 | ERROR | stderr | await func()
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
2024-05-10 05:58:10 | ERROR | stderr | message = await receive()
2024-05-10 05:58:10 | ERROR | stderr | ^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
2024-05-10 05:58:10 | ERROR | stderr | await self.message_event.wait()
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait
2024-05-10 05:58:10 | ERROR | stderr | await fut
2024-05-10 05:58:10 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe560585710
2024-05-10 05:58:10 | ERROR | stderr |
2024-05-10 05:58:10 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-05-10 05:58:10 | ERROR | stderr |
2024-05-10 05:58:10 | ERROR | stderr | + Exception Group Traceback (most recent call last):
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
2024-05-10 05:58:10 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value]
2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
2024-05-10 05:58:10 | ERROR | stderr | | return await self.app(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
2024-05-10 05:58:10 | ERROR | stderr | | await super().call(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
2024-05-10 05:58:10 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
2024-05-10 05:58:10 | ERROR | stderr | | raise exc
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
2024-05-10 05:58:10 | ERROR | stderr | | await self.app(scope, receive, _send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
2024-05-10 05:58:10 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | raise exc
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
2024-05-10 05:58:10 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
2024-05-10 05:58:10 | ERROR | stderr | | await route.handle(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
2024-05-10 05:58:10 | ERROR | stderr | | await self.app(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
2024-05-10 05:58:10 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | raise exc
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
2024-05-10 05:58:10 | ERROR | stderr | | await response(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call
2024-05-10 05:58:10 | ERROR | stderr | | async with anyio.create_task_group() as task_group:
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
2024-05-10 05:58:10 | ERROR | stderr | | raise BaseExceptionGroup(
2024-05-10 05:58:10 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2024-05-10 05:58:10 | ERROR | stderr | +-+---------------- 1 ----------------
2024-05-10 05:58:10 | ERROR | stderr | | Traceback (most recent call last):
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:58:10 | ERROR | stderr | | await func()
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
2024-05-10 05:58:10 | ERROR | stderr | | async for chunk in self.body_iterator:
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream
2024-05-10 05:58:10 | ERROR | stderr | | sampling_params = SamplingParams(
2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init
2024-05-10 05:58:10 | ERROR | stderr | | self._verify_args()
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args
2024-05-10 05:58:10 | ERROR | stderr | | raise ValueError(
2024-05-10 05:58:10 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -763.
2024-05-10 05:58:10 | ERROR | stderr | +------------------------------------
2024-05-10 05:58:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1225. worker_id: 42c39e7a.
2024-05-10 05:58:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1225. worker_id: 42c39e7a.
2024-05-10 05:59:19 | INFO | stdout | INFO: 127.0.0.1:51574 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:59:19 | INFO | stdout | INFO: 127.0.0.1:51576 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 05:59:19 async_llm_engine.py:371] Received request 0ea08a7a94b44c0783dd435e387725d3: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=1e-08, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=4048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:59:19 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:59:20 async_llm_engine.py:111] Finished request 0ea08a7a94b44c0783dd435e387725d3.
INFO 05-10 05:59:20 async_llm_engine.py:134] Aborted request 0ea08a7a94b44c0783dd435e387725d3.
2024-05-10 05:59:20 | INFO | stdout | INFO: 127.0.0.1:51578 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 05:59:44 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a.
2024-05-10 06:00:29 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a.
2024-05-10 06:01:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a.
2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55740 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55742 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 06:01:30 async_llm_engine.py:371] Received request a22b98b85b564eaaad458ba20d1addaf: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 06:01:30 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 06:01:30 async_llm_engine.py:111] Finished request a22b98b85b564eaaad458ba20d1addaf.
INFO 05-10 06:01:30 async_llm_engine.py:134] Aborted request a22b98b85b564eaaad458ba20d1addaf.
2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55744 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56292 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56294 - "POST /count_token HTTP/1.1" 200 OK
2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56298 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-05-10 06:01:46 | ERROR | stderr | ERROR: Exception in ASGI application
2024-05-10 06:01:46 | ERROR | stderr | Traceback (most recent call last):
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call
2024-05-10 06:01:46 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive))
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 06:01:46 | ERROR | stderr | await func()
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
2024-05-10 06:01:46 | ERROR | stderr | message = await receive()
2024-05-10 06:01:46 | ERROR | stderr | ^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
2024-05-10 06:01:46 | ERROR | stderr | await self.message_event.wait()
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait
2024-05-10 06:01:46 | ERROR | stderr | await fut
2024-05-10 06:01:46 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe560586f10
2024-05-10 06:01:46 | ERROR | stderr |
2024-05-10 06:01:46 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-05-10 06:01:46 | ERROR | stderr |
2024-05-10 06:01:46 | ERROR | stderr | + Exception Group Traceback (most recent call last):
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
2024-05-10 06:01:46 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value]
2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
2024-05-10 06:01:46 | ERROR | stderr | | return await self.app(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
2024-05-10 06:01:46 | ERROR | stderr | | await super().call(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
2024-05-10 06:01:46 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
2024-05-10 06:01:46 | ERROR | stderr | | raise exc
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
2024-05-10 06:01:46 | ERROR | stderr | | await self.app(scope, receive, _send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
2024-05-10 06:01:46 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | raise exc
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
2024-05-10 06:01:46 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
2024-05-10 06:01:46 | ERROR | stderr | | await route.handle(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
2024-05-10 06:01:46 | ERROR | stderr | | await self.app(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
2024-05-10 06:01:46 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | raise exc
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
2024-05-10 06:01:46 | ERROR | stderr | | await response(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call
2024-05-10 06:01:46 | ERROR | stderr | | async with anyio.create_task_group() as task_group:
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
2024-05-10 06:01:46 | ERROR | stderr | | raise BaseExceptionGroup(
2024-05-10 06:01:46 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2024-05-10 06:01:46 | ERROR | stderr | +-+---------------- 1 ----------------
2024-05-10 06:01:46 | ERROR | stderr | | Traceback (most recent call last):
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 06:01:46 | ERROR | stderr | | await func()
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
2024-05-10 06:01:46 | ERROR | stderr | | async for chunk in self.body_iterator:
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream
2024-05-10 06:01:46 | ERROR | stderr | | sampling_params = SamplingParams(
2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init
2024-05-10 06:01:46 | ERROR | stderr | | self._verify_args()
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args
2024-05-10 06:01:46 | ERROR | stderr | | raise ValueError(
2024-05-10 06:01:46 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -763.
2024-05-10 06:01:46 | ERROR | stderr | +------------------------------------
2024-05-10 06:01:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:02:44 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:03:29 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:04:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:04:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.

jbding pushed a commit to jbding/FastChat that referenced this issue May 13, 2024
This was referenced May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant