INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 #1913

SriAlavandar · 2024-05-13T10:04:21Z

Hi, I am trying to use OneDNN BenchDNN with v2.6.3 and OneDNN BenchDNN with v3.4.1 to observe how the performance has improved with respect to specific M,N,K Dimensions with AVX512_VNNI Kernels(Int8).

I am using the dimensions from LLM variants that is being used to Generate the tokens. The Table represents m,n,k dimensions with input length 1024 for one of the LLM Variant and followed by time taken for execution with v3.4.1 and v2.6.3

As we can observe in the table the efficiency is on par between the two versions. I would like to know why we are not observing the improvement in efficiency? (If there are any specific tweaks that needs to be done to observe the enhancement of kernel)

Sample BenchDNN command I am using for this activity:
v2.6.3 --> ./benchdnn --matmul --mode=p --cfg=u8s8s8 --stag=ab --wtag=any --dtag=ab --fix-times-per-prb=200 --perf-template=%prb%,%-time%,%+time%,%0time%,%-Gflops%,%+Gflops%,%0Gflops%,%bw% 4012x4096:4096x16384
v3.4.1 --> ./benchdnn --matmul --mode=p --dt=u8:s8:s8 --stag=ab --wtag=any --dtag=ab --fix-times-per-prb=200 --perf-template=%prb%,%-time%,%+time%,%0time%,%-Gflops%,%+Gflops%,%0Gflops%,%bw% 4012x4096:4096x16384

Kernel Triggered: brg:avx512_core_vnni Kernel for both the versions.

I have also observed difference in the weight tag between the versions, with v2.6.3 we are observing wei_s8::blocked:BA16a64b4a::f0 and with v3.4.1 we are observing wei_s8:a:blocked:BA16a64b4a::f0. I would like to know what is the difference between following tags?

The text was updated successfully, but these errors were encountered:

asirvaiy · 2024-05-14T13:14:38Z

Hi,
Thanks for posting.

I am trying to replicate your numbers with the oneDNN versions you mentioned.
The results are the following:
M,N,K = 4012,16384,4096; ratio v3.4.1 over v2.6.3 is 1.16x
M,N,K = 4012,4096,4096: ratio v3.4.1 over v2.6.3 is 1.14x
M,N,K = 4012,130528,4096; ratio v3.4.1 over v2.6.3 is 1.1x

For v3.4.1, I am using oneAPI compiler latest version, 2024.1 and tbb version is also the latest, 2021.12
For v2.6.3, I am using oneAPI compiler version, 2021.3 and tbb version is 2021.3

As you have not shared the system details (number of cores, etc.), we can't compare the numbers directly, but if we see a comparison or oneDNN version, v3.4.1 is better than v2.6.3. I am using bare metal 3rd Gen Intel Xeon Processor.
Please share your compiler and tbb versions and machine details.

In oneDNN v3.4.1,

a -- indicates memory desc was created with fmt_kind any.

SriAlavandar added the question label May 13, 2024

vpirogov assigned onednnsupporttriage May 13, 2024

asirvaiy self-assigned this May 13, 2024

shu1chen unassigned onednnsupporttriage May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 #1913

INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 #1913

SriAlavandar commented May 13, 2024

asirvaiy commented May 14, 2024 •

edited by dzarukin

INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 #1913

INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 #1913

Comments

SriAlavandar commented May 13, 2024

asirvaiy commented May 14, 2024 • edited by dzarukin

asirvaiy commented May 14, 2024 •

edited by dzarukin