vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 20.4k

Code
Issues 853
Pull requests 262
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 30

Virtual Office Hours: Jun 5 and Jun 20

#4919 opened May 20, 2024 by robertgshaw2-neuralmagic

Open

v0.5.0 Release Tracker

#5224 opened Jun 3, 2024 by simon-mo

Open

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

853 Open 2,055 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Feature]: vllm-flash-attn cu118 compatibility feature request

#5232 opened Jun 3, 2024 by epark001

[Feature]: Custom attention masks feature request

#5228 opened Jun 3, 2024 by ojus1

[Usage]: How to start inference serving through LLM object usage

How to use vllm

#5227 opened Jun 3, 2024 by Jiayi-Pan

v0.5.0 Release Tracker

#5224 opened Jun 3, 2024 by simon-mo

2 tasks

[Usage]: RuntimeError: CUDA error: uncorrectable ECC error encountered usage

How to use vllm

#5222 opened Jun 3, 2024 by DJCoolDev

[Doc]: Update the vllm distributed Inference and Serving with the new MultiprocessingGPUExecutor documentation

Improvements or additions to documentation

#5221 opened Jun 3, 2024 by rcarrata

[Bug]: Mixtral-8x22 request cancelled by cancel scope when client sends multiple concurrent requests bug

Something isn't working

#5220 opened Jun 3, 2024 by markovalexander

[Bug]: prompt_logprobs=0 raises AssertionError bug

Something isn't working

#5213 opened Jun 3, 2024 by toslunar

[Installation]: Failed to build punica installation

Installation problems

#5212 opened Jun 3, 2024 by asinglestep

[Usage]: how to terminal a vllm model and free or release gpu memory usage

How to use vllm

#5211 opened Jun 3, 2024 by wellcasa

[Feature]: Support for Mirostat, Dynamic Temperature, and Quadratic Sampling feature request

#5209 opened Jun 3, 2024 by Emmie411

[Bug]: VLLM_ATTENTION_BACKEND set to ROCM_FLASH only in GHA environment, overriding automatic backend selection; this breaks other kernel unit tests. bug

Something isn't working

#5208 opened Jun 3, 2024 by afeldman-nm

[Bug]: Detokenize delay when update vllm from 0.3.0 to 0.4.2 bug

Something isn't working

#5206 opened Jun 3, 2024 by DriverSong

[Feature]: Option to override HuggingFace's configurations feature request

#5205 opened Jun 3, 2024 by DarkLight1337

[Bug]: Different token return behaviors from vllm 0.3.0 → 0.4.3 bug

Something isn't working

#5204 opened Jun 3, 2024 by cyc00518

[Feature]: inconsistent vocab_sizes support for draft and target workers while using Speculative Decoding feature request

#5203 opened Jun 3, 2024 by ShangmingCai

May I ask when the qwen moe quantization version is supported, preferably using auto gptq or awq. feature request

#5202 opened Jun 3, 2024 by wellcasa

[Feature]: Speculative edits feature request

#5201 opened Jun 3, 2024 by Muhtasham

[Bug]: Issues with Applying LoRA in vllm on a T4 GPU bug

Something isn't working

#5199 opened Jun 2, 2024 by rikitomo

[Usage]: how to use the gpu_cache_usage_perc as a custom metric in k8s HPA? usage

How to use vllm

#5195 opened Jun 2, 2024 by chakpongchung

[Usage]: How can I deploy llama3-70b on a server with 8 3090 GPUs with lora and CUDA graph. usage

How to use vllm

#5193 opened Jun 2, 2024 by AlphaINF

[Bug]: loading squeezellm model bug

Something isn't working

#5190 opened Jun 2, 2024 by yuhuixu1993

[Bug]: vLLM api_server.py when using with prompt_token_ids causes error. bug

Something isn't working

#5186 opened Jun 1, 2024 by TikZSZ

[Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ? feature request

#5185 opened Jun 1, 2024 by xxll88

[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary asyncio.exceptions.CancelledError bug

Something isn't working

#5182 opened Jun 1, 2024 by jlcmoore

Previous 1 2 3 4 5 … 34 35 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-05-03.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly