vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.4k

Code
Issues 812
Pull requests 227
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 23

v0.4.2 Release Tracker

#4505 by simon-mo was closed May 5, 2024

Closed 12

Virtual Office Hours: May 15 2pm ET

#4538 opened May 1, 2024 by robertgshaw2-neuralmagic

Open 1

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

812 Open 1,897 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: Async engine hangs with 0.4.* releases bug

Something isn't working

#4789 opened May 13, 2024 by glos-nv

[Bug]: RAM OOM Error Loading 480GB MoE Model Despite Fix in PR #1395 bug

Something isn't working

#4786 opened May 13, 2024 by hxer7963

[Bug]: multi-gpu for baichuan2-13B-Chat benchmark_serving bug

Something isn't working

#4785 opened May 13, 2024 by shudct

[Bug]: deploy Phi-3-mini-128k-instruct AssertionError bug

Something isn't working

#4784 opened May 13, 2024 by hxujal

[Usage]: How to change the batch size when testing the throughput of VLLM by running benchmark_throughput usage

How to use vllm

#4783 opened May 13, 2024 by Ourspolaire1

[Doc]: Doc for using tensorizer_uri with LLM is incorrect documentation

Improvements or additions to documentation

#4782 opened May 13, 2024 by GRcharles

[Feature]: Support the OpenAI Batch Chat Completions file format feature request

#4777 opened May 13, 2024 by wuisawesome

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt bug

Something isn't working

#4772 opened May 12, 2024 by leejamesss

[Feature]: Host CPU Docker image on Docker Hub feature request

#4771 opened May 12, 2024 by VMinB12

[Feature]: CI: Test on NVLink-enabled machine feature request

#4770 opened May 12, 2024 by youkaichao

[Feature]: could paged_attention_v1 support parameter 'attn_bias' feature request

#4766 opened May 11, 2024 by cillinzhang

[Feature]: Support W4A8KV4 Quantization(QServe/QoQ) feature request

#4763 opened May 11, 2024 by bratao

[Performance]: Why the avg. througput generation is low? performance

Performance-related issues

#4760 opened May 11, 2024 by rvsh2

[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8 bug

Something isn't working

#4756 opened May 11, 2024 by sfc-gh-zhwang

Regression in support of customized "role" in OpenAI compatible API (v.0.4.2) good first issue

Good for newcomers

#4755 opened May 10, 2024 by simon-mo

[Usage]: prompt_logprompt from endpoint usage

How to use vllm

#4747 opened May 10, 2024 by basma-b

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU usage

How to use vllm

#4744 opened May 10, 2024 by danielstankw

[RFC]: Support specifying quant_config details in the LLM or Server entrypoints feature request RFC

#4743 opened May 10, 2024 by mgoin

[Bug]: ValueError when using LoRA with CohereForCausalLM model bug

Something isn't working

#4742 opened May 10, 2024 by onlyfish79

[Bug]: squeezeLLM with sparse could not work. bug

Something isn't working

#4741 opened May 10, 2024 by RyanWMHI

[Bug]: why the logits is different between 0.4.1 and 0.4.2 bug

Something isn't working

#4740 opened May 10, 2024 by sitabulaixizawaluduo

[New Model]: Blip2 Support required new model

Requests to new models

#4739 opened May 10, 2024 by anisingh1

[New Model]: fastspeech2_conformer (just need a new attention mechanism: RelPositionMultiHeadedAttention) new model

Requests to new models

#4736 opened May 10, 2024 by cillinzhang

VRAM USAGE WHEN LOADING THE MODEL misc

#4733 opened May 10, 2024 by Fahmie23

[Bug]: When enforce_eager is True or False, the paged_attention version used is inconsistent bug

Something isn't working

#4731 opened May 10, 2024 by liangxuegang

Previous 1 2 3 4 5 … 32 33 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly