Add support for INT4/UINT4 #1712

WilliamTambellini · 2023-08-29T20:11:11Z

Summary

Add support for INT4 and/or UINT4
Refs:
https://intellabs.github.io/distiller/quantization.html
https://developer.nvidia.com/blog/int4-for-ai-inference/
https://arxiv.org/abs/2301.12017
https://arxiv.org/pdf/2306.11987.pdf
https://www.xilinx.com/support/documents/white_papers/wp521-4bit-optimization.pdf

Problem statement

Describe the problem you are trying to solve with reasonable level of details.
Fast low 4bit precision quantized matmul.

Preferred solution

A new onednn datatype and at least quantmatmul (no need for full arithm/math).

vpirogov · 2023-09-01T21:31:49Z

This is on our radar, in particular in context of large language models. As the references indicate this area is an active field of research, in particular focusing on techniques to minimize accuracy loss. Are there any specific quantization approaches and usage models you have in mind? Anything validated in production setting?

WilliamTambellini · 2023-09-07T17:18:12Z

Are there any specific quantization approaches and usage models you have in mind?

Cannot legally say much but there are already some opensource LLM quantizers :
https://github.com/PanQiWei/AutoGPTQ
but they could requires some samples to quantize on.
Be aware of static vs dynamic quant.
Most Transformer decoder would do the job for you to test, eg:
https://huggingface.co/Qwen/Qwen-7B-Chat-Int4#%E9%87%8F%E5%8C%96-quantization

Anything validated in production setting?

Cannot reply publicly but as long as the perplexity after quantization is "close" to the one with bf16 then you should be good.

Could you confirm that SapphireRapids CPUs (4th gen) dont seem to have any hardware support (AMX, ..) for s4 nor u4 math ?

vpirogov · 2023-09-12T22:12:36Z

@WilliamTambellini, you don't necessarily need s4/u4 math to take advantage of low precision. I believe most viable use cases focus on using s4/u4 as a storage formats for weights with math being done in int8 or fp16. So for oneDNN the question effectively boils down to what quantization scheme for these data types would be viable.

WilliamTambellini · 2023-12-18T20:21:11Z

fyi
onnx/onnx#5811

igorsafo · 2023-12-18T20:33:16Z

Hi @WilliamTambellini ,
Yes, we are aware of int4 support in OpenVino. The following RFCs target GPT-Q support in oneDNN:

vpirogov · 2024-03-12T20:40:13Z

API, validation, and GPU optimizations for int4 landed into main branch targeting oneDNN v3.5.

WilliamTambellini · 2024-03-12T23:30:17Z

tks @vpirogov

WilliamTambellini added the enhancement A feature or an optimization request label Aug 29, 2023

vpirogov self-assigned this Sep 1, 2023

vpirogov assigned igorsafo and unassigned vpirogov Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for INT4/UINT4 #1712

Add support for INT4/UINT4 #1712

WilliamTambellini commented Aug 29, 2023

vpirogov commented Sep 1, 2023

WilliamTambellini commented Sep 7, 2023 •

edited

vpirogov commented Sep 12, 2023

WilliamTambellini commented Dec 18, 2023

igorsafo commented Dec 18, 2023

vpirogov commented Mar 12, 2024

WilliamTambellini commented Mar 12, 2024

Add support for INT4/UINT4 #1712

Add support for INT4/UINT4 #1712

Comments

WilliamTambellini commented Aug 29, 2023

Summary

Problem statement

Preferred solution

vpirogov commented Sep 1, 2023

WilliamTambellini commented Sep 7, 2023 • edited

vpirogov commented Sep 12, 2023

WilliamTambellini commented Dec 18, 2023

igorsafo commented Dec 18, 2023

vpirogov commented Mar 12, 2024

WilliamTambellini commented Mar 12, 2024

WilliamTambellini commented Sep 7, 2023 •

edited