Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid inconsistencies in kernels that contain code written by scripts #152

Open
rfvander opened this issue Apr 21, 2017 · 1 comment
Open
Assignees

Comments

@rfvander
Copy link
Contributor

In Stencil, AMR, and Branch some code snippets are written by scripts. These are included in the main program. When you build the code with new command line parameters (for example, RADIUS for stencil kernels), the auto-generated code snippets do not get changed, which may lead to inconsistencies. When you first "make clean" in that directory, you may think you are starting with a clean slate, but the snippets are only removed using "make veryclean." This looked like a convenience, since in Branch it may take a long time for the script to write that code. But it can easily lead to confusion. I will fix this.

@rfvander rfvander self-assigned this Apr 21, 2017
@jeffhammond
Copy link
Member

jeffhammond commented Jun 25, 2017

You might be interested in reusing my stencil code generator.

The OpenCL one - https://github.com/ParRes/Kernels/blob/master/Cxx11/generate-opencl-stencil.py - is straightforward and easy to adapt to C89.

The C++ one is more complicated because it generates seven different types of loop syntax (one for each programming model it supports): https://github.com/ParRes/Kernels/blob/master/Cxx11/generate-cxx-stencil.py

Note that the code is explicitly instantiated for every case (e.g. https://github.com/ParRes/Kernels/blob/master/Cxx11/stencil_seq.hpp) and the driver needs to branch into them (e.g. https://github.com/ParRes/Kernels/blob/master/Cxx11/stencil-vector.cc#L177). One could of course replace the default case with a generic implementation, but I figure that limiting the user to radius=9 by default isn't a bad thing.

For OpenCL, I generated the code on the fly because the OpenCL programming model is happy with that, whereas C++ codes have to include the header. Including the header with 2x9 stencil kernels (~90KB of code) increases compilation time significantly (a few seconds with an optimizing compiler like ICC). One could easily reduce compile-time by reducing the default maximum radius to something like 5.

I have thought about using JiT for C++, but that entails invoking the compiler, generating a shared library, then dlopen-ing the shared library, which seems like a pain. It's also not viable in a variety of scenarios, e.g. Cray machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants