forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial CompressedTensors
config + Activation Quantization support for static W8A8 per tensor
#195
Merged
Commits on Apr 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 18adcee - Browse repository at this point
Copy the full SHA 18adceeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 38dcd67 - Browse repository at this point
Copy the full SHA 38dcd67View commit details -
Configuration menu - View commit details
-
Copy full SHA for 263749a - Browse repository at this point
Copy the full SHA 263749aView commit details -
Configuration menu - View commit details
-
Copy full SHA for bbe0a70 - Browse repository at this point
Copy the full SHA bbe0a70View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5a93cb7 - Browse repository at this point
Copy the full SHA 5a93cb7View commit details -
Configuration menu - View commit details
-
Copy full SHA for e822fef - Browse repository at this point
Copy the full SHA e822fefView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c271e4 - Browse repository at this point
Copy the full SHA 0c271e4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b02d6e - Browse repository at this point
Copy the full SHA 3b02d6eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e09160b - Browse repository at this point
Copy the full SHA e09160bView commit details -
Configuration menu - View commit details
-
Copy full SHA for dcb1e59 - Browse repository at this point
Copy the full SHA dcb1e59View commit details -
Configuration menu - View commit details
-
Copy full SHA for 48956bc - Browse repository at this point
Copy the full SHA 48956bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 35d2d96 - Browse repository at this point
Copy the full SHA 35d2d96View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1dfa7f6 - Browse repository at this point
Copy the full SHA 1dfa7f6View commit details -
Configuration menu - View commit details
-
Copy full SHA for b2c39a1 - Browse repository at this point
Copy the full SHA b2c39a1View commit details -
Compression config cutlass (#205)
Use cutlass kernels. --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Configuration menu - View commit details
-
Copy full SHA for e8d1886 - Browse repository at this point
Copy the full SHA e8d1886View commit details -
Configuration menu - View commit details
-
Copy full SHA for b840eae - Browse repository at this point
Copy the full SHA b840eaeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6868f97 - Browse repository at this point
Copy the full SHA 6868f97View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c89aa9 - Browse repository at this point
Copy the full SHA 6c89aa9View commit details -
Configuration menu - View commit details
-
Copy full SHA for a0a9a75 - Browse repository at this point
Copy the full SHA a0a9a75View commit details -
Configuration menu - View commit details
-
Copy full SHA for 14d5f25 - Browse repository at this point
Copy the full SHA 14d5f25View commit details
Commits on Apr 25, 2024
-
Compression config perf fix (#207)
Description: Remove logging triggers a device-to-host copy. --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Configuration menu - View commit details
-
Copy full SHA for e5f391f - Browse repository at this point
Copy the full SHA e5f391fView commit details -
Configuration menu - View commit details
-
Copy full SHA for ddb10d8 - Browse repository at this point
Copy the full SHA ddb10d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c5f2a0 - Browse repository at this point
Copy the full SHA 0c5f2a0View commit details
Commits on Apr 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 540c159 - Browse repository at this point
Copy the full SHA 540c159View commit details -
Configuration menu - View commit details
-
Copy full SHA for 677f02c - Browse repository at this point
Copy the full SHA 677f02cView commit details -
Compression config - cleanup (#215)
Description: - rename `csrc/quantization/smoothquant/fused_kernels.cu` -> `csrc/quantization/compressed_tensors/int8_quant_kernels.cu` - Remove `csrc/attention/dtype_int8.cuh` - Remove unused quant_per_token kernel. Rename `ops.quant` to `ops.quant_per_tensor` - Remove unused `quant_utils.cuh` - Remove unused `blockReduceMax` code from reduction_utils.cuh --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Configuration menu - View commit details
-
Copy full SHA for bd99627 - Browse repository at this point
Copy the full SHA bd99627View commit details -
Configuration menu - View commit details
-
Copy full SHA for cf61e07 - Browse repository at this point
Copy the full SHA cf61e07View commit details
Commits on Apr 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 96fea65 - Browse repository at this point
Copy the full SHA 96fea65View commit details -
Configuration menu - View commit details
-
Copy full SHA for 093e688 - Browse repository at this point
Copy the full SHA 093e688View commit details -
Configuration menu - View commit details
-
Copy full SHA for 681fb3b - Browse repository at this point
Copy the full SHA 681fb3bView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.