Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping arguments in inlining #648

Open
sv2518 opened this issue Jul 13, 2022 · 1 comment · May be fixed by #650
Open

Dropping arguments in inlining #648

sv2518 opened this issue Jul 13, 2022 · 1 comment · May be fixed by #650

Comments

@sv2518
Copy link
Contributor

sv2518 commented Jul 13, 2022

Hi. I noticed an issue with the argument passing that has been introduced in #631 (I believe).

When we inline an inner kernel in a wrapper kernel, where the inner kernels takes many arguments but the code only depends on a subset of it, then in the C code, the inlined kernel only takes a subset of the arguments as parameters. A minimal example can be found here https://gist.github.com/sv2518/d6690f63b6827750fdaed9b01b4d9cea. I noticed that this only happens to CTargets, not for whatever Loo.py's default target is.

This is problematic in Firedrake for the vectorisation where we generate a loopy kernel first, do the inlining and vectorisation transformations and C codegen after that, and then pass the arguments dynamically. Meaning the arguments for the kernel before the transformations must be the same as after it. The MFE I linked above is coming from our test suite, the corresponding test is tests/regression/test_par_loops.py::test_dict_order_parallel.

Also sorry for the noise recently, it's mostly because I am still trying to get the Firedrake vectorisation PR landed.

@kaushikcfd
Copy link
Collaborator

kaushikcfd commented Jul 14, 2022

Thanks for the report! That's a concerning bug. Arguments must not be dropped in entrypoint kernels. I think this should be an easy fix. Here's a smaller reproducer:

knl = lp.make_kernel(
    "{ : }",
    """
    a[0] = 1
    """,
    [lp.GlobalArg("a,b,c,d,e",
                  shape=(10,),
                  dtype="float64")])
print(lp.generate_code_v2(knl).device_code())

which generates the kernel:

__kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) loopy_kernel(__global double *__restrict__ a)
{
  a[0] = 1.0;
}

Notice how the arguments b,c,d,e were removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants