Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat]: Support weight only gemm with 2bit #1568

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gavinchen430
Copy link

support weight only gemm with 2bit

Note: This pr depends on two pull requests in cutlass repo:
NVIDIA/cutlass#1512
NVIDIA/cutlass#1517

@Hongbosherlock
Copy link

Hongbosherlock commented May 14, 2024

Hi @gavinchen430, Nice work! Thanks for your contribution.
How can I reproduce this PR? Just download the branch gavinchen430:gemm_w2a16 then build and run it locally? If you could provide guidance on running with TensorRT-LLM and performance data of some models like llama2, it would be greatly helpful. I think this would also assist maintainers in reviewing this PR.

@gavinchen430
Copy link
Author

gavinchen430 commented May 15, 2024

Hi @gavinchen430, Nice work! Thanks for your contribution. How can I reproduce this PR? Just download the branch gavinchen430:gemm_w2a16 then build and run it locally? If you could provide guidance on running with TensorRT-LLM and performance data of some models like llama2, it would be greatly helpful. I think this would also assist maintainers in reviewing this PR.

We are currently writing examples detailing how to produce quantized models using the quantization toolkit and how to deploy 2-bit quantized models using this w2a16 kernel. We will open source these examples to this repository(https://github.com/bytedance/decoupleQ) recently.

@byshiue
Copy link
Collaborator

byshiue commented May 16, 2024

Hi, @gavinchen430. Thank you for the contribution. Could you help providing an example in TensorRT-LLM, too? It is helpful to understand how to use the feature.

@byshiue byshiue self-assigned this May 16, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label May 16, 2024
@Fridayfairy
Copy link

Fantasy, I'll try it later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants