Clang JIT CPU Backend #1239

jeremylt · 2023-06-21T20:27:46Z

Clang 16 now supports JIT. An interesting small project could be to create a /cpu/self/clang-jit backend that provides JITed tensor contraction kernels. If we see performance that is in the neighborhood of AVX or libXSMM, this could be a way to ship a faster CPU backend with fewer dependencies.

See Serac for reference:
https://github.com/LLNL/serac/blob/prototype/adjoints_with_internal_variables/tests/jit/basic_jit.cpp
https://github.com/LLNL/serac/blob/prototype/adjoints_with_internal_variables/include/JIT.hpp

(This repo comes from a member of Jamie's Smith team)

The text was updated successfully, but these errors were encountered:

jedbrown · 2023-06-21T20:43:42Z

Certainly interesting, but do note that we have a limited number of combinations in tensor contractions so this is more of a solution to a finding that compile-time constant sizes are a huge benefit and that we can't pare down that combinatorial space to do ahead-of-time specialization.

A different use might be to use JIT to build single-precision versions of select kernels.

jeremylt · 2023-06-21T21:08:18Z

Right, I'd expect that if we enumerated a bunch of kernels ahead of time across combos of p, q, num_comp, and blocked/serial we'd see the same performance, but that approach is intractable.

WRT performance I just mean that my gut expects the performance of such a backend to be between AVX and LIBXSMM, but without the need for a user to build LIBXSMM so we might get a little better performance in our upcoming Ratel + Enzyme container.

I agree that single-precision kernels would be an interesting avenue to explore too so its easier to get mixed precision capabilities.

jedbrown · 2023-06-21T21:26:52Z

It's a low-effort test to see if specializing one particular size has much benefit. Like just drop in some integer literals and run a benchmark using matching sizes. If it's a lot faster, we can see if specializing all the values is important or, say, just one matters. If it's about the same, we don't need to pursue the idea (at least until we learn more).

jeremylt · 2023-06-21T21:55:25Z

That's a good point. Its a easy test to check if someone finds time. I don't see this as a particular priority - 50% of why I created this issue was so we don't lose track of this as an option.

jeremylt added performance CPU labels Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clang JIT CPU Backend #1239

Clang JIT CPU Backend #1239

jeremylt commented Jun 21, 2023

jedbrown commented Jun 21, 2023

jeremylt commented Jun 21, 2023

jedbrown commented Jun 21, 2023

jeremylt commented Jun 21, 2023

Clang JIT CPU Backend #1239

Clang JIT CPU Backend #1239

Comments

jeremylt commented Jun 21, 2023

jedbrown commented Jun 21, 2023

jeremylt commented Jun 21, 2023

jedbrown commented Jun 21, 2023

jeremylt commented Jun 21, 2023