[feat]: Support weight only gemm with 2bit #1568

gavinchen430 · 2024-05-09T11:05:46Z

support weight only gemm with 2bit

Note: This pr depends on two pull requests in cutlass repo:
NVIDIA/cutlass#1512
NVIDIA/cutlass#1517

Hongbosherlock · 2024-05-14T08:47:01Z

Hi @gavinchen430, Nice work! Thanks for your contribution.
How can I reproduce this PR? Just download the branch gavinchen430:gemm_w2a16 then build and run it locally? If you could provide guidance on running with TensorRT-LLM and performance data of some models like llama2, it would be greatly helpful. I think this would also assist maintainers in reviewing this PR.

gavinchen430 · 2024-05-15T02:32:02Z

Hi @gavinchen430, Nice work! Thanks for your contribution. How can I reproduce this PR? Just download the branch gavinchen430:gemm_w2a16 then build and run it locally? If you could provide guidance on running with TensorRT-LLM and performance data of some models like llama2, it would be greatly helpful. I think this would also assist maintainers in reviewing this PR.

We are currently writing examples detailing how to produce quantized models using the quantization toolkit and how to deploy 2-bit quantized models using this w2a16 kernel. We will open source these examples to this repository(https://github.com/bytedance/decoupleQ) recently.

byshiue · 2024-05-16T06:46:09Z

Hi, @gavinchen430. Thank you for the contribution. Could you help providing an example in TensorRT-LLM, too? It is helpful to understand how to use the feature.

Fridayfairy · 2024-05-16T11:41:58Z

Fantasy, I'll try it later

[feat]: Support weight only gemm with 2bit

cde2e2e

ChuanhongLi mentioned this pull request May 14, 2024

关于量化模型推理 bytedance/decoupleQ#1

Open

byshiue self-assigned this May 16, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat]: Support weight only gemm with 2bit #1568

[feat]: Support weight only gemm with 2bit #1568

gavinchen430 commented May 9, 2024

Hongbosherlock commented May 14, 2024 •

edited

gavinchen430 commented May 15, 2024 •

edited

byshiue commented May 16, 2024

Fridayfairy commented May 16, 2024

[feat]: Support weight only gemm with 2bit #1568

Are you sure you want to change the base?

[feat]: Support weight only gemm with 2bit #1568

Conversation

gavinchen430 commented May 9, 2024

Hongbosherlock commented May 14, 2024 • edited

gavinchen430 commented May 15, 2024 • edited

byshiue commented May 16, 2024

Fridayfairy commented May 16, 2024

Hongbosherlock commented May 14, 2024 •

edited

gavinchen430 commented May 15, 2024 •

edited