Skip to content

Navigation Menu

Explore
For
- Enterprise
- Teams
- Startups
- Education
By Solution
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.2k

Code
Issues 782
Pull requests 215
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

Labels 41 Milestones 0

New pull request New

215 Open 1,606 Closed

215 Open 1,606 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies

#4696 opened May 8, 2024 by KuntaiDu

Loading…

4

Remove Ray health check

#4693 opened May 8, 2024 by Yard1

Loading…

[Kernel] [FP8] Improve FP8 linear layer performance

#4691 opened May 8, 2024 by pcmoritz

Loading…

6

[Core] Implement sharded state loader

#4690 opened May 8, 2024 by aurickq

Loading…

2

[Frontend] OpenAI API server: Do not add bos token by default when encoding

#4688 opened May 8, 2024 by bofenghuang

Loading…

[Misc] Add OpenTelemetry support

#4687 opened May 8, 2024 by ronensc

Loading…

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API

#4681 opened May 8, 2024 by rkooo567

Loading…

[Frontend] Move async logic outside of constructor

#4674 opened May 8, 2024 by DarkLight1337

Loading…

[Draft] [FP8] CUTLASS FP8 matrix multiply

#4662 opened May 7, 2024 by pcmoritz • Draft

[ROCm][Hardware][AMD] Adding Navi21 to fallback to naive attention if Triton is not used

rocm

#4658 opened May 7, 2024 by alexeykondrat

Loading…

[Frontend][OpenAI] Add support for OpenAI tools calling

#4656 opened May 7, 2024 by Xwdit

Loading…

8

[Model] Snowflake arctic model implementation

#4652 opened May 7, 2024 by sfc-gh-hazhang

Loading…

Support Deepseek-V2

new model

Requests to new models

#4650 opened May 7, 2024 by zwd003

Loading…

3

[WIP] Warning upon preemption and Swapping

action-required

#4647 opened May 7, 2024 by rkooo567

Loading…

3

[CORE] Adding support for insertion of soft-tuned prompts

#4645 opened May 7, 2024 by SwapnilDreams100

Loading…

6

[Frontend][OpenAI] Support for returning max_model_len on /v1/models response

#4643 opened May 7, 2024 by Avinash-Raj

Loading…

fix MiniCPM tie_word_embeddings

#4641 opened May 7, 2024 by Receiling

Loading…

[Frontend] Dynamic RoPE scaling

#4638 opened May 7, 2024 by sasha0552

Loading…

[CI] Add llama 3 model test

action-required

#4637 opened May 6, 2024 by rkooo567

Loading…

[Model] Add support for IBM Granite Code models

#4636 opened May 6, 2024 by yikangshen

Loading…

[Kernel] Use Flashinfer for prefill

#4628 opened May 6, 2024 by LiuXiaoxuanPKU

Loading…

2 tasks done

3

[Core] Update _earliest_arrival_time calculation of the waiting seq_groups

#4613 opened May 6, 2024 by Felix-Zhenghao

Loading…

2

chunked-prefill-doc-syntax

#4603 opened May 5, 2024 by simon-mo

Loading…

9

[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class.

#4598 opened May 4, 2024 by yecohn

Loading…

[BugFix] Fix fp8 quantizer

#4593 opened May 4, 2024 by Kev1ntan

Loading…

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.