neuralmagic / nm-vllm Public

forked from vllm-project/vllm

Notifications
Fork 6
Star 174

Code
Issues 2
Pull requests 24
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: neuralmagic/nm-vllm

Labels 9 Milestones 0

New pull request New

24 Open 180 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Upstream sync 2024 04 26

#211 opened Apr 26, 2024 by robertgshaw2-neuralmagic

Loading…

Add lm-eval correctness test

#210 opened Apr 25, 2024 by dbarbuzzi

Loading…

2 tasks

Torch compile fusion backend prototype

#209 opened Apr 25, 2024 by bnellnm • Draft

update workflows to use generated whls

#204 opened Apr 23, 2024 by andy-neuma

Loading…

[WIP] Please do not delete - comparing changes between branches

#203 opened Apr 23, 2024 by afeldman-nm

Loading…

Add test framework for server

#200 opened Apr 22, 2024 by dbarbuzzi • Draft

Upstream sync 2024 04 21

#198 opened Apr 22, 2024 by robertgshaw2-neuralmagic

Loading…

Initial CompressedTensors config + Activation Quantization support for static W8A8 per tensor

#195 opened Apr 18, 2024 by dsikka

Loading…

[WIP] FLAN-T5 integration

#194 opened Apr 17, 2024 by afeldman-nm

Loading…

WIP: basic correctness test

#192 opened Apr 17, 2024 by derekk-nm • Draft

whl centric

#191 opened Apr 17, 2024 by andy-neuma

Loading…

Prototype FP8Linear W8A8 runtime quantization

#190 opened Apr 15, 2024 by mgoin • Draft

Entrypoint for hosting local Kobold Lite chat interface

#184 opened Apr 12, 2024 by mgoin

Loading…

Added Docker Compose Example

#182 opened Apr 12, 2024 by robertgshaw2-neuralmagic

Loading…

vllm - quantization : DO NOT MERGE

#180 opened Apr 11, 2024 by varun-sundar-rabindranath

Loading…

Pypi and updates

#177 opened Apr 9, 2024 by andy-neuma

Loading…

[WIP][Core] Add Automatic Prefix Caching to BlockSpaceManagerV2

#171 opened Apr 8, 2024 by SageMoore

Loading…

[WIP] Upstream encoder/decoder support based on multiple blocktables

#161 opened Apr 2, 2024 by afeldman-nm • Draft

Support for compressed-tensors

#159 opened Apr 2, 2024 by dbogunowicz

Loading…

[Timings] Add the ability to log times for async and sync calls

#152 opened Mar 27, 2024 by dsikka

Loading…

[WiP] Whisper Implementation

#147 opened Mar 26, 2024 by dbogunowicz

Loading…

[wip] holistic trace analysis

#146 opened Mar 25, 2024 by LucasWilkinson • Draft

Prometheus deliverable 1+2

#93 opened Mar 5, 2024 by horheynm

Loading…

[WIP] afeldman-nm/encoder decoder

#22 opened Feb 16, 2024 by afeldman-nm

Loading…

ProTip! Updated in the last three days: updated:>2024-04-25.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly