vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.7k
Star 20.2k

Code
Issues 882
Pull requests 258
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 28

Virtual Office Hours: Jun 5 and Jun 20

#4919 opened May 20, 2024 by robertgshaw2-neuralmagic

Open

v0.4.3 Release Tracker

#4895 opened May 18, 2024 by simon-mo

Open 12

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

882 Open 1,969 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[New Model]: IBM Granite Code Models new model

Requests to new models

#5095 opened May 29, 2024 by Semihal

[Bug]: Can't run vllm distributed inference with vLLM + Ray bug

Something isn't working

#5094 opened May 29, 2024 by linchen111

[Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors. bug

Something isn't working

#5093 opened May 29, 2024 by macheng6

[Bug]: Gemma model fails with GPTQ marlin bug

Something isn't working

#5088 opened May 28, 2024 by arunpatala

[Installation]: Error when importing LLM from vllm installation

Installation problems

#5086 opened May 28, 2024 by manishkumar0709

[Feature]: Adopt Colossal Inference Features (55% speedup over vLLM) feature request

#5085 opened May 28, 2024 by casper-hansen

[Bug]: The vllm is disconnected after running for some time bug

Something isn't working

#5084 opened May 28, 2024 by zxcdsa45687

[RFC]: OpenAI Triton-only backend RFC

#5083 opened May 28, 2024 by bringlein

curl http://localhost:8000/generate {"detail":"Not Found"}[Usage] generate relu can not ues usage

How to use vllm

#5082 opened May 28, 2024 by fishingcatgo

[Usage]: Inference on LLama3-8b-Instruct using Lora Adapter usage

How to use vllm

#5078 opened May 28, 2024 by jetlime

[Bug]: The inference speed of vllm running command-r-plus-gptq is very slow bug

Something isn't working

#5076 opened May 28, 2024 by leoterry-ulrica

[Performance]: A few performance-related questions. performance

Performance-related issues

#5072 opened May 27, 2024 by maxin9966

[Bug]: Build/Install Issues with pip install -e . bug

Something isn't working

#5071 opened May 27, 2024 by Msiavashi

[Bug]: The VRAM usage of calculating log_probs is not considered in profile run bug

Something isn't working

#5067 opened May 27, 2024 by Conless

[Misc]: How to use guided decoding and regex as well? misc

#5063 opened May 27, 2024 by debraj135

[Bug]: When load model weights, there are infinite loading bug

Something isn't working

#5062 opened May 27, 2024 by tjrlwjd1

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. bug

Something isn't working

#5060 opened May 26, 2024 by heungson

[Feature]: multi-steps model_runner? feature request

#5055 opened May 26, 2024 by leiwen83

Running Vllm on ray cluster, logging stuck at loading bug

Something isn't working

#5052 opened May 25, 2024 by maherr13

[Feature]: Add num_requests_preempted metric feature request

#5051 opened May 25, 2024 by sathyanarays

[Installation]: installation

Installation problems

#5048 opened May 25, 2024 by Kastycupra

[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes feature request

#5041 opened May 24, 2024 by achandrasekar

[Bug]: 英伟达最新驱动555.85，vllm运行报错 bug

Something isn't working

#5035 opened May 24, 2024 by gaye746560359

[Bug]: Command-R incorrect output contains <EOS_TOKEN> and seems to do text prediction rather than conversation bug

Something isn't working

#5030 opened May 24, 2024 by epignatelli

[Usage]: I use llama3. I found that one token is 'Ġor' in tokenizer.get_vocab(). But when I use vllm server, I got ' or' in response. usage

How to use vllm

#5028 opened May 24, 2024 by fengshansi

Previous 1 2 3 4 5 … 35 36 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly