-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[BugFix]Fix the problem that StopChecker assumes a single token produ…
#5243
opened Jun 4, 2024 by
IcyFeather233
Loading…
[Kernel] Add back batch size 1536 and 3072 to MoE tuning
#5242
opened Jun 4, 2024 by
WoosukKwon
Loading…
[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100
#5238
opened Jun 4, 2024 by
pcmoritz
Loading…
Bugfix: fix broken of download models from modelscope
#5233
opened Jun 3, 2024 by
liuyhwangyh
Loading…
[Core][Doc] Default to multiprocessing for single-node distributed case
#5230
opened Jun 3, 2024 by
njhill
Loading…
[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True
#5226
opened Jun 3, 2024 by
zifeitong
Loading…
[Misc] Adding Speculative decoding to Throughput Benchmarking script
#5223
opened Jun 3, 2024 by
abhibambhaniya
Loading…
[BugFix] Apply get_cached_tokenizer to the tokenizer setter of LLM
#5207
opened Jun 3, 2024 by
DriverSong
Loading…
[Misc] Improve error message when LoRA parsing fails
#5194
opened Jun 2, 2024 by
DarkLight1337
Loading…
2
[Core][Prefix Caching] Fix hashing logic for non-full blocks
#5188
opened Jun 2, 2024 by
zhuohan123
Loading…
[Bugfix] [Frontend] vLLM api_server.py when using with prompt_token_ids causes error.
#5187
opened Jun 1, 2024 by
TikZSZ
Loading…
[Kernel] Switch fp8 layers to use the CUTLASS kernels
#5183
opened Jun 1, 2024 by
tlrmchlsmth
•
Draft
bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.
#5173
opened Jun 1, 2024 by
charent
Loading…
[Bugfix] Fix KeyError: 1 When Using LoRA adapters
#5164
opened May 31, 2024 by
BlackBird-Coding
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.