vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 20.4k

Code
Issues 854
Pull requests 262
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

262 Open 1,828 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[BugFix]Fix the problem that StopChecker assumes a single token produ…

#5243 opened Jun 4, 2024 by IcyFeather233

Loading…

[Kernel] Add back batch size 1536 and 3072 to MoE tuning

#5242 opened Jun 4, 2024 by WoosukKwon

Loading…

[CI/Build] Reducing CPU CI execution time

#5241 opened Jun 4, 2024 by bigPYJ1151

Loading…

[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100

#5238 opened Jun 4, 2024 by pcmoritz

Loading…

[Frontend] Add OpenAI Vision API Support

#5237 opened Jun 4, 2024 by ywang96 • Draft

Bugfix: fix broken of download models from modelscope

#5233 opened Jun 3, 2024 by liuyhwangyh

Loading…

[Model] Correct Mixtral FP8 checkpoint loading

#5231 opened Jun 3, 2024 by comaniac

Loading…

[Core][Doc] Default to multiprocessing for single-node distributed case

#5230 opened Jun 3, 2024 by njhill

Loading…

[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True

#5226 opened Jun 3, 2024 by zifeitong

Loading…

[Misc] Adding Speculative decoding to Throughput Benchmarking script

#5223 opened Jun 3, 2024 by abhibambhaniya

Loading…

Support W4A8 quantization for vllm

#5218 opened Jun 3, 2024 by HandH1998

Loading…

[Core] Registry for processing model inputs

#5214 opened Jun 3, 2024 by DarkLight1337

Loading…

[BugFix] Apply get_cached_tokenizer to the tokenizer setter of LLM

#5207 opened Jun 3, 2024 by DriverSong

Loading…

[Frontend] Customizable RoPE theta

#5197 opened Jun 2, 2024 by sasha0552

Loading…

[Misc] Improve error message when LoRA parsing fails

#5194 opened Jun 2, 2024 by DarkLight1337

Loading…

[Core] Support loading GGUF model

#5191 opened Jun 2, 2024 by Isotr0py • Draft

1 of 4 tasks

[Model] Add PaliGemma

#5189 opened Jun 2, 2024 by ywang96 • Draft

[Core][Prefix Caching] Fix hashing logic for non-full blocks

#5188 opened Jun 2, 2024 by zhuohan123

Loading…

[Bugfix] [Frontend] vLLM api_server.py when using with prompt_token_ids causes error.

#5187 opened Jun 1, 2024 by TikZSZ

Loading…

[Kernel] Switch fp8 layers to use the CUTLASS kernels

#5183 opened Jun 1, 2024 by tlrmchlsmth • Draft

[Model] LoRA support added for command-r

#5178 opened Jun 1, 2024 by sergey-tinkoff

Loading…

draft2

#5175 opened Jun 1, 2024 by khluu • Draft

bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.

#5173 opened Jun 1, 2024 by charent

Loading…

[Bugfix] Fix illegal memory access for lora

#5169 opened May 31, 2024 by sfc-gh-zhwang • Draft

[Bugfix] Fix KeyError: 1 When Using LoRA adapters

#5164 opened May 31, 2024 by BlackBird-Coding

Loading…

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly