-
Notifications
You must be signed in to change notification settings - Fork 957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for INT4/UINT4 #1712
Comments
This is on our radar, in particular in context of large language models. As the references indicate this area is an active field of research, in particular focusing on techniques to minimize accuracy loss. Are there any specific quantization approaches and usage models you have in mind? Anything validated in production setting? |
Cannot legally say much but there are already some opensource LLM quantizers :
Cannot reply publicly but as long as the perplexity after quantization is "close" to the one with bf16 then you should be good. Could you confirm that SapphireRapids CPUs (4th gen) dont seem to have any hardware support (AMX, ..) for s4 nor u4 math ? |
@WilliamTambellini, you don't necessarily need s4/u4 math to take advantage of low precision. I believe most viable use cases focus on using s4/u4 as a storage formats for weights with math being done in int8 or fp16. So for oneDNN the question effectively boils down to what quantization scheme for these data types would be viable. |
fyi |
Hi @WilliamTambellini , |
API, validation, and GPU optimizations for int4 landed into main branch targeting oneDNN v3.5. |
tks @vpirogov |
Summary
Add support for INT4 and/or UINT4
Refs:
https://intellabs.github.io/distiller/quantization.html
https://developer.nvidia.com/blog/int4-for-ai-inference/
https://arxiv.org/abs/2301.12017
https://arxiv.org/pdf/2306.11987.pdf
https://www.xilinx.com/support/documents/white_papers/wp521-4bit-optimization.pdf
Problem statement
Describe the problem you are trying to solve with reasonable level of details.
Fast low 4bit precision quantized matmul.
Preferred solution
A new onednn datatype and at least quantmatmul (no need for full arithm/math).
The text was updated successfully, but these errors were encountered: