Controlling the C compiler for the last bit of performance and cross compiler portability #515
Labels
C: Codegen
The final C code generation
S: Needs Discussion
This needs discussion to decide if important to work
So, far we have been relying on the C compiler as is to compile Exo programs with some optimization flag set (e.g. simply -O3) and indenitifying the tareget architecture (-march).
This has quickly became insufficient as we care about getting perfect performance on our kernels across all inputs. Currently, for example I disable some loop optimizations because they work against what I am trying to do if the cost-model decides thinks otherwise.
It might make sense to start thinking of ways of how to control the C compilers to do what's useful for Exo programs rather than think of them as generic programs. At the end of the day, Exo programs are 1) are very specific semantically 2) very specific in the way the code is represented 3) already optimized.
We might want to try to have recommendations of what set of flags to use when compiling our programs in general. A friend of mine worked on this tool before (https://github.com/ethanlabelle/compiler_tuner) which tunes the compiler parameters for a given program. It might be a fun excercise to use it to tune the C compiler on each kernel we have implemented so far and see if there is a shared set of flags across all kernels. The alternative would be to parse through all compiler flags and think which make or don't make sense, but that may require way too much time.
Another idea is to add tooling so that in case we go through LLVM we could potentially emit specific optimization passes we want to apply on Exo programs which will potentially give us even more granular control over the C compilation process. There has also been recent work (https://arxiv.org/pdf/2309.07062.pdf) on ussing LLMs to generate the proper LLVM flags to tune a given piece of LLVM-IR. You could imagine training something like this on a set of LLVM-IR generated from compiling C generated Exo programs.
Other concerns, getting more control over the C compiler might help making Exo programs have more performance portability across compilers. Currently, I see some non-trivial degeredation as I move in-between compiler vendors and compiler versions; this is mostly on smaller sizes where the code around the loops can have visible impact on performance.
The text was updated successfully, but these errors were encountered: