vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 20.3k

Code
Issues 846
Pull requests 262
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 29

Virtual Office Hours: Jun 5 and Jun 20

#4919 opened May 20, 2024 by robertgshaw2-neuralmagic

Open

v0.4.3 Release Tracker

#4895 opened May 18, 2024 by simon-mo

Open 13

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

846 Open 2,036 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary asyncio.exceptions.CancelledError bug

Something isn't working

#5182 opened Jun 1, 2024 by jlcmoore

[Feature]: BERT models for embeddings feature request

#5179 opened Jun 1, 2024 by mevince

[Bug]: Incorrect Example for the Inference with Prefix bug

Something isn't working

#5177 opened Jun 1, 2024 by Delviet

[Usage]: Prefix caching in VLLM usage

How to use vllm

#5176 opened Jun 1, 2024 by Abhinay2323

[Bug]: Model Launch Hangs with 16+ Ranks in vLLM bug

Something isn't working

#5170 opened May 31, 2024 by wushidonguc

[Performance]: What can we learn from OctoAI performance

Performance-related issues

#5167 opened May 31, 2024 by hmellor

[Bug]: Unable to Use Prefix Caching in AsyncLLMEngine bug

Something isn't working

#5162 opened May 31, 2024 by kezouke

[Bug]: WSL2(Including Docker) 2 GPU problem --tensor-parallel-size 2 bug

Something isn't working

#5161 opened May 31, 2024 by goodmaney

[Feature]: Linear adapter support for Mixtral feature request

#5155 opened May 31, 2024 by DhruvaBansal00

[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. bug

Something isn't working

#5154 opened May 31, 2024 by LIUKAI0815

[Bug]: CUDA illegal memory access when calling flash_attn_cuda.fwd_kvcache bug

Something isn't working

#5152 opened May 31, 2024 by khluu

[Misc]: Should inference with temperature 0 generate the same results for a lora adapter and equivalent merged model? misc

#5148 opened May 31, 2024 by rohan-daniscox

[Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory when Handle inference requests bug

Something isn't working

#5147 opened May 31, 2024 by zhaotyer

[Usage]: how should I do data parallelism using vLLM? usage

How to use vllm

#5143 opened May 30, 2024 by YuWang916

[Bug]: nsys cannot track the cuda kernel called by the process except rank 0 bug

Something isn't working

#5132 opened May 30, 2024 by crazy-JiangDongHua

[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX feature request

#5128 opened May 30, 2024 by zhaofangtao

[Feature]: Triton GPTQ feature request

#5127 opened May 30, 2024 by double-vin

[Usage]: extractive question answering using VLLM usage

How to use vllm

#5126 opened May 30, 2024 by suryavan11

[New Model]: LLaVA-NeXT-Video support new model

Requests to new models

#5124 opened May 30, 2024 by AmazDeng

[Bug]: The tail problem bug

Something isn't working

#5123 opened May 30, 2024 by ZixinxinWang

[Usage]: Multiple samplig params with OpenAI library usage

How to use vllm

#5117 opened May 30, 2024 by JH-lee95

[Bug]: Crash sometimes using LLM entrypoint and LoRA adapters bug

Something isn't working

#5113 opened May 29, 2024 by flexorRegev

[Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM misc

#5107 opened May 29, 2024 by AkshataDM

[Bug]: async engine failure when placing multi lora adapter under load bug

Something isn't working

#5105 opened May 29, 2024 by DavidPeleg6

[Bug]: can not clean up the memory usage after instantiating the LLM class. bug

Something isn't working

#5104 opened May 29, 2024 by c3-ali

Previous 1 2 3 4 5 … 33 34 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly