Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 #1913

Open
SriAlavandar opened this issue May 13, 2024 · 1 comment
Open

INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 #1913

SriAlavandar opened this issue May 13, 2024 · 1 comment
Assignees
Labels

Comments

@SriAlavandar
Copy link

Hi, I am trying to use OneDNN BenchDNN with v2.6.3 and OneDNN BenchDNN with v3.4.1 to observe how the performance has improved with respect to specific M,N,K Dimensions with AVX512_VNNI Kernels(Int8).

I am using the dimensions from LLM variants that is being used to Generate the tokens. The Table represents m,n,k dimensions with input length 1024 for one of the LLM Variant and followed by time taken for execution with v3.4.1 and v2.6.3

image

As we can observe in the table the efficiency is on par between the two versions. I would like to know why we are not observing the improvement in efficiency? (If there are any specific tweaks that needs to be done to observe the enhancement of kernel)

Sample BenchDNN command I am using for this activity:
v2.6.3 --> ./benchdnn --matmul --mode=p --cfg=u8s8s8 --stag=ab --wtag=any --dtag=ab --fix-times-per-prb=200 --perf-template=%prb%,%-time%,%+time%,%0time%,%-Gflops%,%+Gflops%,%0Gflops%,%bw% 4012x4096:4096x16384
v3.4.1 --> ./benchdnn --matmul --mode=p --dt=u8:s8:s8 --stag=ab --wtag=any --dtag=ab --fix-times-per-prb=200 --perf-template=%prb%,%-time%,%+time%,%0time%,%-Gflops%,%+Gflops%,%0Gflops%,%bw% 4012x4096:4096x16384

Kernel Triggered: brg:avx512_core_vnni Kernel for both the versions.

I have also observed difference in the weight tag between the versions, with v2.6.3 we are observing wei_s8::blocked:BA16a64b4a::f0 and with v3.4.1 we are observing wei_s8:a:blocked:BA16a64b4a::f0. I would like to know what is the difference between following tags?

@asirvaiy
Copy link

asirvaiy commented May 14, 2024

Hi,
Thanks for posting.

I am trying to replicate your numbers with the oneDNN versions you mentioned.
The results are the following:
M,N,K = 4012,16384,4096; ratio v3.4.1 over v2.6.3 is 1.16x
M,N,K = 4012,4096,4096: ratio v3.4.1 over v2.6.3 is 1.14x
M,N,K = 4012,130528,4096; ratio v3.4.1 over v2.6.3 is 1.1x

For v3.4.1, I am using oneAPI compiler latest version, 2024.1 and tbb version is also the latest, 2021.12
For v2.6.3, I am using oneAPI compiler version, 2021.3 and tbb version is 2021.3

As you have not shared the system details (number of cores, etc.), we can't compare the numbers directly, but if we see a comparison or oneDNN version, v3.4.1 is better than v2.6.3. I am using bare metal 3rd Gen Intel Xeon Processor.
Please share your compiler and tbb versions and machine details.

In oneDNN v3.4.1,

  • a -- indicates memory desc was created with fmt_kind any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants