Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bfloat16/float32 mixed-precision gemm example (using AMX-BF16/AVX512-BF16)? #877

Open
croci opened this issue Apr 24, 2024 · 2 comments
Open

Comments

@croci
Copy link

croci commented Apr 24, 2024

I have access to a Sapphire Rapids CPU and I would like to test the libxsmm gemms performance using bfloat16 as input and float32 as output so that AMX-BF16/AVX512-BF16 instructions are used. However, the documentation only includes the following example:

#include <libxsmm.h>
#include <vector>
int main(int argc, char* argv[]) {
  typedef double T;
  int batchsize = 1000, m = 13, n = 5, k = 7;
  std::vector<T> a(batchsize * m * k), b(batchsize * k * n), c(m * n, 0);
  /* C/C++ and Fortran interfaces are available */
  typedef libxsmm_mmfunction<T> kernel_type;
  /* generates and dispatches a matrix multiplication kernel (C++ functor) */
  kernel_type kernel(LIBXSMM_GEMM_FLAG_NONE, m, n, k, 1.0 /*alpha*/, 1.0 /*beta*/);
  assert(kernel);
  for (int i = 0; i < batchsize; ++i) { /* initialize input */
    for (int ki = 0; ki < k; ++ki) {
      for (int j = 0; j < m; ++j) a[i * j * ki] = static_cast<T>(1) / ((i + j + ki) % 25);
      for (int j = 0; j < n; ++j) b[i * j * ki] = static_cast<T>(7) / ((i + j + ki) % 75);
    }
  }
  /* kernel multiplies and accumulates matrices: C += Ai * Bi */
  for (int i = 0; i < batchsize; ++i) kernel(&a[i * m * k], &b[i * k * n], &c[0]);
}

How should I modify the above example so that libxsmm performs a mixed-precision bfloat16/float32 gemm?

Generally speaking, it would be helpful if the documentation had more examples.

Thank you very much!

@alheinecke
Copy link
Collaborator

thanks @stefan0re

We are right now prepping the version 2.0 release and the C++ interface is no longer covering mix edprecision as all the low precision types are not defined in the language. We hadn't had a chance to update the documentation... sorry :-(

As @stefan0re mentioned, samples/xgemm has example apps for GEMM and samples/etlwise for eltwise unary/binary operation. This small apps aim to serve as the C only API documentation as the simple C codes show what the highly optimized implementations of libxsmm do mathematically.

Regarding the TPP concept the following ARXIV papers could be helpful:
https://arxiv.org/abs/2104.05755
https://arxiv.org/abs/2304.12576
https://arxiv.org/abs/2404.15204

@croci
Copy link
Author

croci commented Apr 26, 2024

Thank you for your answers!

Just to clarify: after the version 2.0 release will there be mixed-precision support in C and will it eventually be documented?

What do you mean by low-precision types not defined in C++? Aren't they defined since C++23 (https://en.cppreference.com/w/cpp/types/floating-point)?

In terms of the samples/xgemm codes, as someone that is new to the library I am finding these indecipherable: they have no comments that explain what they do and there is no real matching documentation with examples. As these are all gemms it should be relatively easy to write a few easy-to-understand examples?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants