Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies
#4696
opened May 8, 2024 by
KuntaiDu
Loading…
[Frontend] OpenAI API server: Do not add bos token by default when encoding
#4688
opened May 8, 2024 by
bofenghuang
Loading…
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API
#4681
opened May 8, 2024 by
rkooo567
Loading…
[Frontend] Move async logic outside of constructor
#4674
opened May 8, 2024 by
DarkLight1337
Loading…
[ROCm][Hardware][AMD] Adding Navi21 to fallback to naive attention if Triton is not used
rocm
#4658
opened May 7, 2024 by
alexeykondrat
Loading…
[WIP] Warning upon preemption and Swapping
action-required
#4647
opened May 7, 2024 by
rkooo567
Loading…
[CORE] Adding support for insertion of soft-tuned prompts
#4645
opened May 7, 2024 by
SwapnilDreams100
Loading…
[Frontend][OpenAI] Support for returning max_model_len on /v1/models response
#4643
opened May 7, 2024 by
Avinash-Raj
Loading…
[Kernel] Use Flashinfer for prefill
#4628
opened May 6, 2024 by
LiuXiaoxuanPKU
Loading…
2 tasks done
[Core] Update
_earliest_arrival_time
calculation of the waiting seq_groups
#4613
opened May 6, 2024 by
Felix-Zhenghao
Loading…
[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class.
#4598
opened May 4, 2024 by
yecohn
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.