-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Usage]: How to start inference serving through How to use vllm
LLM
object
usage
#5227
opened Jun 3, 2024 by
Jiayi-Pan
[Usage]: RuntimeError: CUDA error: uncorrectable ECC error encountered
usage
How to use vllm
#5222
opened Jun 3, 2024 by
DJCoolDev
[Doc]: Update the vllm distributed Inference and Serving with the new MultiprocessingGPUExecutor
documentation
Improvements or additions to documentation
#5221
opened Jun 3, 2024 by
rcarrata
[Bug]: Mixtral-8x22 request cancelled by cancel scope when client sends multiple concurrent requests
bug
Something isn't working
#5220
opened Jun 3, 2024 by
markovalexander
[Bug]: prompt_logprobs=0 raises AssertionError
bug
Something isn't working
#5213
opened Jun 3, 2024 by
toslunar
[Installation]: Failed to build punica
installation
Installation problems
#5212
opened Jun 3, 2024 by
asinglestep
[Usage]: how to terminal a vllm model and free or release gpu memory
usage
How to use vllm
#5211
opened Jun 3, 2024 by
wellcasa
[Feature]: Support for Mirostat, Dynamic Temperature, and Quadratic Sampling
feature request
#5209
opened Jun 3, 2024 by
Emmie411
[Bug]: VLLM_ATTENTION_BACKEND set to ROCM_FLASH only in GHA environment, overriding automatic backend selection; this breaks other kernel unit tests.
bug
Something isn't working
#5208
opened Jun 3, 2024 by
afeldman-nm
[Bug]: Detokenize delay when update vllm from 0.3.0 to 0.4.2
bug
Something isn't working
#5206
opened Jun 3, 2024 by
DriverSong
[Feature]: Option to override HuggingFace's configurations
feature request
#5205
opened Jun 3, 2024 by
DarkLight1337
[Bug]: Different token return behaviors from vllm 0.3.0 → 0.4.3
bug
Something isn't working
#5204
opened Jun 3, 2024 by
cyc00518
May I ask when the qwen moe quantization version is supported, preferably using auto gptq or awq.
feature request
#5202
opened Jun 3, 2024 by
wellcasa
[Bug]: Issues with Applying LoRA in vllm on a T4 GPU
bug
Something isn't working
#5199
opened Jun 2, 2024 by
rikitomo
[Usage]: how to use the gpu_cache_usage_perc as a custom metric in k8s HPA?
usage
How to use vllm
#5195
opened Jun 2, 2024 by
chakpongchung
[Usage]: How can I deploy llama3-70b on a server with 8 3090 GPUs with lora and CUDA graph.
usage
How to use vllm
#5193
opened Jun 2, 2024 by
AlphaINF
[Bug]: vLLM api_server.py when using with prompt_token_ids causes error.
bug
Something isn't working
#5186
opened Jun 1, 2024 by
TikZSZ
[Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ?
feature request
#5185
opened Jun 1, 2024 by
xxll88
[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary Something isn't working
asyncio.exceptions.CancelledError
bug
#5182
opened Jun 1, 2024 by
jlcmoore
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-05-03.