Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial CompressedTensors config + Activation Quantization support for static W8A8 per tensor #195

Merged
merged 30 commits into from
Apr 30, 2024

Commits on Apr 24, 2024

  1. initial commit

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    18adcee View commit details
    Browse the repository at this point in the history
  2. add quant/dequant functions

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    38dcd67 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    263749a View commit details
    Browse the repository at this point in the history
  4. add updated model runner

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    bbe0a70 View commit details
    Browse the repository at this point in the history
  5. add more files

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    5a93cb7 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e822fef View commit details
    Browse the repository at this point in the history
  7. update

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    0c271e4 View commit details
    Browse the repository at this point in the history
  8. update

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    3b02d6e View commit details
    Browse the repository at this point in the history
  9. fix model loading

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    e09160b View commit details
    Browse the repository at this point in the history
  10. for fake quant, just use torch

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    dcb1e59 View commit details
    Browse the repository at this point in the history
  11. remove if

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    48956bc View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    35d2d96 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    1dfa7f6 View commit details
    Browse the repository at this point in the history
  14. fix gibberish

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    b2c39a1 View commit details
    Browse the repository at this point in the history
  15. Compression config cutlass (#205)

    Use cutlass kernels.
    
    ---------
    
    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    2 people authored and dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    e8d1886 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    b840eae View commit details
    Browse the repository at this point in the history
  17. format

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    6868f97 View commit details
    Browse the repository at this point in the history
  18. remove print; update todo

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    6c89aa9 View commit details
    Browse the repository at this point in the history
  19. fix rebase

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    a0a9a75 View commit details
    Browse the repository at this point in the history
  20. update unquant

    dsikka committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    14d5f25 View commit details
    Browse the repository at this point in the history

Commits on Apr 25, 2024

  1. Compression config perf fix (#207)

    Description:
     Remove logging triggers a device-to-host copy.
    
    ---------
    
    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    varun-sundar-rabindranath and Varun Sundar Rabindranath committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    e5f391f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ddb10d8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0c5f2a0 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2024

  1. PR comments

    dsikka committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    540c159 View commit details
    Browse the repository at this point in the history
  2. more comments

    dsikka committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    677f02c View commit details
    Browse the repository at this point in the history
  3. Compression config - cleanup (#215)

    Description:
    - rename `csrc/quantization/smoothquant/fused_kernels.cu` ->
    `csrc/quantization/compressed_tensors/int8_quant_kernels.cu`
     - Remove `csrc/attention/dtype_int8.cuh`
    - Remove unused quant_per_token kernel. Rename `ops.quant` to
    `ops.quant_per_tensor`
     - Remove unused `quant_utils.cuh`
     - Remove unused `blockReduceMax` code from reduction_utils.cuh
    
    ---------
    
    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    varun-sundar-rabindranath and Varun Sundar Rabindranath committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    bd99627 View commit details
    Browse the repository at this point in the history
  4. cleanup

    dsikka committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    cf61e07 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2024

  1. Configuration menu
    Copy the full SHA
    96fea65 View commit details
    Browse the repository at this point in the history
  2. cleanup

    dsikka committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    093e688 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    681fb3b View commit details
    Browse the repository at this point in the history