How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? #1895

Teaonly · 2024-05-05T13:18:09Z

The configure for creating primitive_desc of Matrix Multiplication

'''
memory::desc a_md({M, K}, memory::data_type::f16, {K, 1}); // M x K layout
memory::desc b_md({K, N}, memory::data_type::s8, {N, 1}); // K x M layout
memory::desc c_md({M, N}, memory::data_type::f16, {N, 1}); // M x N layout
primitive_attr attr;
attr.set_scales_mask(DNNL_ARG_WEIGHTS, 1); // channel based quantized int8
// Create a MatMul primitive descriptor
auto pd = matmul::primitive_desc(eng, a_md, b_md, c_md, attr);
'''

This code will cause a unimplemented exception:
"Message: could not create a primitive descriptor for a matmul primitive"

How can i create matmul with A16W8 ?

Teaonly · 2024-05-05T22:48:25Z

$ ./examples/tutorials-matmul-inference-int8-matmul-cpp gpu
onednn_verbose,info,oneDNN v3.6.0 (commit 95c00ed)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:22
onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,info,gpu,runtime:OpenCL
onednn_verbose,info,gpu,engine,0,name:Intel(R) Arc(TM) Graphics,driver_version:24.9.28717,binary_kernels:enabled
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,jit:xe_hp:gemm:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,skipping or dispatching to another implementation,src/gpu/intel/jit/gemm/xe_hp_systolic_gemm.cpp:75
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,ocl:gemm_with_po:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,runtime dimension is not supported,src/gpu/intel/ocl/gemm/gemm_with_post_ops.cpp:42
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,jit:gemm:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,unsupported datatype,src/gpu/intel/jit/gemm/gen_gemm.hpp:124
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,ocl:ref:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,unsupported attribute,src/gpu/intel/ocl/gemm/ref_gemm.hpp:81
onednn_verbose,primitive,create:dispatch,matmul,failed to create nested primitive gemm,src/gpu/intel/ocl/gemm_matmul.hpp:266
onednn_verbose,primitive,create:dispatch,matmul,gpu,matmul,ocl:ref:any,undef,src_f16::blocked:ab::f0 wei_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,runtime_dims_masks:1:0,*x96:96x1000,unsupported datatype combination,src/gpu/intel/ocl/ref_matmul.hpp:70
oneDNN error caught:
Status: unimplemented
Message: could not create a primitive descriptor for a matmul primitive
Example failed on GPU.

igorsafo · 2024-05-05T23:07:50Z

Hi @Teaonly , here is an example: https://github.com/oneapi-src/oneDNN/blob/main/examples/tutorials/matmul/weights_decompression_matmul.cpp (or https://oneapi-src.github.io/oneDNN/page_weights_decompression_matmul_cpp.html#doxid-weights-decompression-matmul-cpp)
The fpmath_mode should be set to force int8 operation to work with floating point computations.

For more information please review a discussion on the same topic: #1893

Teaonly added the enhancement A feature or an optimization request label May 5, 2024

igorsafo added help wanted question and removed enhancement A feature or an optimization request help wanted labels May 5, 2024

vpirogov assigned igorsafo May 9, 2024

dzarukin unassigned igorsafo May 13, 2024

vpirogov assigned igorsafo May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? #1895

How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? #1895

Teaonly commented May 5, 2024

Teaonly commented May 5, 2024

igorsafo commented May 5, 2024

How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? #1895

How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? #1895

Comments

Teaonly commented May 5, 2024

Teaonly commented May 5, 2024

igorsafo commented May 5, 2024