bfloat16/float32 mixed-precision gemm example (using AMX-BF16/AVX512-BF16)? #877

croci · 2024-04-24T10:03:42Z

I have access to a Sapphire Rapids CPU and I would like to test the libxsmm gemms performance using bfloat16 as input and float32 as output so that AMX-BF16/AVX512-BF16 instructions are used. However, the documentation only includes the following example:

#include <libxsmm.h>
#include <vector>
int main(int argc, char* argv[]) {
  typedef double T;
  int batchsize = 1000, m = 13, n = 5, k = 7;
  std::vector<T> a(batchsize * m * k), b(batchsize * k * n), c(m * n, 0);
  /* C/C++ and Fortran interfaces are available */
  typedef libxsmm_mmfunction<T> kernel_type;
  /* generates and dispatches a matrix multiplication kernel (C++ functor) */
  kernel_type kernel(LIBXSMM_GEMM_FLAG_NONE, m, n, k, 1.0 /*alpha*/, 1.0 /*beta*/);
  assert(kernel);
  for (int i = 0; i < batchsize; ++i) { /* initialize input */
    for (int ki = 0; ki < k; ++ki) {
      for (int j = 0; j < m; ++j) a[i * j * ki] = static_cast<T>(1) / ((i + j + ki) % 25);
      for (int j = 0; j < n; ++j) b[i * j * ki] = static_cast<T>(7) / ((i + j + ki) % 75);
    }
  }
  /* kernel multiplies and accumulates matrices: C += Ai * Bi */
  for (int i = 0; i < batchsize; ++i) kernel(&a[i * m * k], &b[i * k * n], &c[0]);
}

How should I modify the above example so that libxsmm performs a mixed-precision bfloat16/float32 gemm?

Generally speaking, it would be helpful if the documentation had more examples.

Thank you very much!

alheinecke · 2024-04-26T04:20:45Z

thanks @stefan0re

We are right now prepping the version 2.0 release and the C++ interface is no longer covering mix edprecision as all the low precision types are not defined in the language. We hadn't had a chance to update the documentation... sorry :-(

As @stefan0re mentioned, samples/xgemm has example apps for GEMM and samples/etlwise for eltwise unary/binary operation. This small apps aim to serve as the C only API documentation as the simple C codes show what the highly optimized implementations of libxsmm do mathematically.

Regarding the TPP concept the following ARXIV papers could be helpful:
https://arxiv.org/abs/2104.05755
https://arxiv.org/abs/2304.12576
https://arxiv.org/abs/2404.15204

croci · 2024-04-26T10:05:58Z

Thank you for your answers!

Just to clarify: after the version 2.0 release will there be mixed-precision support in C and will it eventually be documented?

What do you mean by low-precision types not defined in C++? Aren't they defined since C++23 (https://en.cppreference.com/w/cpp/types/floating-point)?

In terms of the samples/xgemm codes, as someone that is new to the library I am finding these indecipherable: they have no comments that explain what they do and there is no real matching documentation with examples. As these are all gemms it should be relatively easy to write a few easy-to-understand examples?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bfloat16/float32 mixed-precision gemm example (using AMX-BF16/AVX512-BF16)? #877

bfloat16/float32 mixed-precision gemm example (using AMX-BF16/AVX512-BF16)? #877

croci commented Apr 24, 2024 •

edited

alheinecke commented Apr 26, 2024

croci commented Apr 26, 2024

bfloat16/float32 mixed-precision gemm example (using AMX-BF16/AVX512-BF16)? #877

bfloat16/float32 mixed-precision gemm example (using AMX-BF16/AVX512-BF16)? #877

Comments

croci commented Apr 24, 2024 • edited

alheinecke commented Apr 26, 2024

croci commented Apr 26, 2024

croci commented Apr 24, 2024 •

edited