Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? #1895

Open
Teaonly opened this issue May 5, 2024 · 2 comments
Assignees
Labels

Comments

@Teaonly
Copy link

Teaonly commented May 5, 2024

The configure for creating primitive_desc of Matrix Multiplication

'''
memory::desc a_md({M, K}, memory::data_type::f16, {K, 1}); // M x K layout
memory::desc b_md({K, N}, memory::data_type::s8, {N, 1}); // K x M layout
memory::desc c_md({M, N}, memory::data_type::f16, {N, 1}); // M x N layout
primitive_attr attr;
attr.set_scales_mask(DNNL_ARG_WEIGHTS, 1); // channel based quantized int8
// Create a MatMul primitive descriptor
auto pd = matmul::primitive_desc(eng, a_md, b_md, c_md, attr);
'''

This code will cause a unimplemented exception:
"Message: could not create a primitive descriptor for a matmul primitive"

How can i create matmul with A16W8 ?

@Teaonly Teaonly added the enhancement A feature or an optimization request label May 5, 2024
@Teaonly
Copy link
Author

Teaonly commented May 5, 2024

$ ./examples/tutorials-matmul-inference-int8-matmul-cpp gpu
onednn_verbose,info,oneDNN v3.6.0 (commit 95c00ed)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:22
onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,info,gpu,runtime:OpenCL
onednn_verbose,info,gpu,engine,0,name:Intel(R) Arc(TM) Graphics,driver_version:24.9.28717,binary_kernels:enabled
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,jit:xe_hp:gemm:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,skipping or dispatching to another implementation,src/gpu/intel/jit/gemm/xe_hp_systolic_gemm.cpp:75
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,ocl:gemm_with_po:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,runtime dimension is not supported,src/gpu/intel/ocl/gemm/gemm_with_post_ops.cpp:42
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,jit:gemm:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,unsupported datatype,src/gpu/intel/jit/gemm/gen_gemm.hpp:124
onednn_verbose,primitive,create:dispatch,gemm,gpu,gemm,ocl:ref:any,undef,src_a_f16::blocked:ab::f0 src_b_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,,*x96:96x1000,unsupported attribute,src/gpu/intel/ocl/gemm/ref_gemm.hpp:81
onednn_verbose,primitive,create:dispatch,matmul,failed to create nested primitive gemm,src/gpu/intel/ocl/gemm_matmul.hpp:266
onednn_verbose,primitive,create:dispatch,matmul,gpu,matmul,ocl:ref:any,undef,src_f16::blocked:ab::f0 wei_s8::blocked:ab::f0 dst_f16::blocked:ab::f0,attr-scales:wei:2:f32 attr-post-ops:eltwise_relu,runtime_dims_masks:1:0,*x96:96x1000,unsupported datatype combination,src/gpu/intel/ocl/ref_matmul.hpp:70
oneDNN error caught:
Status: unimplemented
Message: could not create a primitive descriptor for a matmul primitive
Example failed on GPU.

@igorsafo
Copy link
Contributor

igorsafo commented May 5, 2024

Hi @Teaonly , here is an example: https://github.com/oneapi-src/oneDNN/blob/main/examples/tutorials/matmul/weights_decompression_matmul.cpp (or https://oneapi-src.github.io/oneDNN/page_weights_decompression_matmul_cpp.html#doxid-weights-decompression-matmul-cpp)
The fpmath_mode should be set to force int8 operation to work with floating point computations.

For more information please review a discussion on the same topic: #1893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants