Using ordinary C functions as Generators

Generator 2.0: The Next Generation

TL;DR: the API for using the JIT vs AOT in Halide is substantially different. In theory, this isn't quite the case (you can explicitly use Func::compile_to_file and the like), but in practice, encapsulating code for AOT into Generator (or something similar) is often easier to integrate into build systems. (In fact, Generator was originally created -- somewhat in haste -- when Halide was first integrated into Google's build system.)

Unfortunately, the Generator designed is/was lacking in various ways, but the one that concerns me here is composability: it is inconvenient (at best) to use a Generator as part of a larger Halide codebase (eg, wiring together several Generators together to form a more-complex Generator). Generator "Stubs" were added later, as a way to attempt to remediate this problem, but seem to have gotten very little adoption, and this is not too surprising: while functional, they are another layer of abstraction on top of Generators, and another layer of code generation, which makes them extra-painful to use in some build environments.

Experience inside Google has shown that plain old C++ functions are the usually the easiest approach for composing pieces of Halide; for an example of this approach in the Halide repo, consider apps/fft, in which all of the interesting logic is done via ordinary C++, which (essentially) takes arguments and returns Halide::Func(s); the Generator in this app is just a thin layer that translates between the Generator's member-based inputs and outputs and the C++ function's arguments and return type.

So this brings us to the crux of the idea: what if we could somehow annotate a C++ function in a way that produces the same AOT-build-time "entry point" that current Generators do? We could (theoretically) have our cake and eat it too.

We'll call this approach "Generator 2.0", or just plain G2 for short in this discussion. We'll refer to the existing/original Generator-as-template-class as "G1" where disambiguation is necessary. We'll just use "Generator" if it applies to both G1 and G2.

Ground Rules

G1 will continue to be supported ~forever.
From the build system's point of view, it should be ~impossible to tell between G1 and G2; they should both be drivable by GenGen/RunGen, take GeneratorParams, etc.
There will be limits on what sort of C++ functions can be used as a G2, but both G1 and G2 should support (generally) the same use cases.
It might be OK for G2 to omit explicit support for rarely-used pieces of G1, at least initially.
One exception to the above parallels between G1 and G2: Generator Stubs will continue to be G1 only; G2 will never support stubs, since the design of G2 should render them moot, if we do it right. (This might allow us to deprecate Stubs entirely in the future, though as mentioned above, G1 will ~never be deprecated; there's too much extant usage.)

The GenGen mechanism

We want this to work with GenGen (ie, the existing mechanism for executing Generators), so let's recap what that looks like; the important part from our perspective here is that a given Generator:

Can enter itself into the global registry of Generators
Can enumerate all of its runtime inputs and outputs, including a unique name for each, the specific type of each, and the dimensionality of each (for buffers).
Can enumerate all legal compile-time options (aka GeneratorParams for G1), along with the type info for each, and a default value for each.

So what does this look like?

Let's start with a stupidly-simple Generator, which simply applies an xor operation over a 2D image:

class Xor : public Halide::Generator<Xor> {
    Input<Buffer<uint8_t>> input{"input_name", 2};
    Input<uint8_t> value{"value_name"};
    Output<Buffer<uint8_t>> output{"output_name", 2};

    Var x, y;

    void generate() {
        output(x, y) = input(x, y) ^ value;
        output.vectorize(x, 8).parallel(y);
    }
};
HALIDE_REGISTER_GENERATOR(Xor, xor)

What should this look like in G2? Well, let's start with the obvious, factoring it into a helper function and a Generator-as-wrapper, in the style of apps/fft:

Func xor_all(Func input, Expr value) {
    Var x, y;
    Func output("output");
    output(x, y) = input(x, y) ^ value;
    output.vectorize(x, 8).parallel(y);
    return output;
}

class Xor : public Halide::Generator<Xor> {
    Input<Buffer<uint8_t>> input{"input_name", 2};
    Input<uint8_t> value{"value_name"};
    Output<Buffer<uint8_t>> output{"output_name", 2};

    void generate() {
        output = xor_all(input, value);
    }
};
HALIDE_REGISTER_GENERATOR(Xor, xor)

So our goal here is to eliminate the need for class Xor entirely, and have some minimal annotation on the xor_all() function that allows us to use it with GenGen. Unfortunately, even if C++ had robust introspection capabilities, the xor_all() function alone is lacking critical information that we require (and that G1 provides under the hood):

AOT compilation requires explicit type-and-dimensionality for all inputs and outputs; neither Func nor Expr provide any static / compile-time information that we can extract, so we'll have to add it explicitly somehow. (e.g., in this case, we want Func input to be constrained to a 2D uint8 buffer for the AOT-compiled version)
It's also highly desirable that all inputs and outputs have well-defined names (e.g. for metadata purposes); we can't rely on the intrinsic name for a Func or Expr (since the latter don't have names and the former can have names that vary by compiler if you rely on introspection), so these will need to be explicitly listed as well.

(Side note: it might be interesting to have a way to optionally add static type-and-dimensionality constraints to Func, Expr, etc. -- e.g. https://github.com/halide/Halide/pull/5235 -- but that would be only useful for future enhancement, not as a prerequisite for this work.)

OK, so let's assume that we have a HALIDE_REGISTER_G2() macro (analogous to HALIDE_REGISTER_GENERATOR) that allows us to add decorators for each argument (and return value) for xor_all(); similar to the techniques used in PyBind11, this doesn't technically allow us to introspect, but it does allow us to add decorations in a way that is robust and complete (i.e., argument count-and-type assumptions will be enforced at C++ compile time). (We'll ignore the implementation details for now, with the handwavy assumption that PyBind11 is an existence proof that this can be made to work.)

Anyway, we should be able to write something like this:

Func xor_all(Func input, Expr value) {
    Var x, y;
    Func output("output");
    output(x, y) = input(x, y) ^ value;
    output.vectorize(x, 8).parallel(y);
    return output;
}
// (note that all the "G2" naming stuff is for convenience in this doc;
// presumably we can come up with something better for the final design)
HALIDE_REGISTER_G2(
    "xor",        // the name of the Generator that the build system sees
    xor_all,      // the C++ function that implements it
    // All the inputs, in order, followed by the output(s):
    g2::input("input_name", UInt(8), 2),  // name, type, dimensionality for 'input'
    g2::input("value_name", UInt(8)),     // name & type, for 'value' (scalar, so no dim)
    g2::output("output_name", UInt(8), 2) // name, type, dimensionality for the reutn value
)

While this may not be a lot smaller than the previous code (in terms of number-of-lines), it does make ~all of the AOT wrapper code declarative, which is a nice step. It also gives us the opportunity to use lambdas to add additional variations easily, e.g.

// Variant of "xor" that always uses 0xff for the value, for demo purposes
HALIDE_REGISTER_G2(
    "xor_ff",
    [](Func input) -> Func {
        return xor_all(input, 0xff);
    },
    g2::input("input_name", UInt(8), 2),
    g2::output("output_name", UInt(8), 2)
)

What about the (implicit) compile-time arguments for Target, Autoscheduler, and MachineParams that are provided by Generator? (Not all Generators examine these, but they must be easily available.) There are plenty of ways we could handle this, but let's just add additional decorator directives that can be added to match optional arguments of this sort:

// Pass in the Target so we can better choose a suitable vector size
Func xor_all(Target target, Func input, Expr value) {
    Var x, y;
    Func output("output");
    output(x, y) = input(x, y) ^ value;
    output.vectorize(x, target.natural_vector_size<uint8_t>()).parallel(y);
    return output;
}
HALIDE_REGISTER_G2(
    "xor",
    xor_all,
    g2::target(),                         // this argument is expected to be the Target
    g2::input("input_name", UInt(8), 2),
    g2::input("value_name", UInt(8)),
    g2::output("output_name", UInt(8), 2)
)

(directives for autoscheduler and/or MachineParams are exercises left to the reader.)

What about GeneratorParams? In terms of a C++ function, these generally correspond to non-Halide types (int, string, etc) being passed as arguments. In C++, these are ordinary arguments, but from a Generator's perspective, they are compile-time constants. Specifying these via GenGen adds a couple of additional requirements:

All of these must have a well-defined default value, so that new values can be added over time without breaking existing usage.
The value of each must be representable as an ordinary character string, which must not contain spaces.

OK, so let's try this on for size:

Func xor_all(Target target, Func input, Expr value, int bias) {
    Var x, y;
    Func output("output");
    output(x, y) = input(x, y) ^ (value + bias);
    output.vectorize(x, target.natural_vector_size<uint8_t>()).parallel(y);
    return output;
}
HALIDE_REGISTER_G2(
    "xor",
    xor_all,
    g2::target(),
    g2::input("input_name", UInt(8), 2),
    g2::input("value_name", UInt(8)),
    g2::generator_param("bias") = 42,     // GeneratorParam is named "bias",
                                          // has default value of 42
    g2::output("output_name", UInt(8), 2)
)

Let's go further and make something that has the operation customizable by a build-time string param:

Func binary_op(Target target, std::string op, Func input, Expr value) {
    Var x, y;
    Func output("output");
    if (op == "xor") {
        output(x, y) = input(x, y) ^ value;
    } else if (op == "add") {
        output(x, y) = input(x, y) + value;
    } else {
        // ... etc
    }
    output.vectorize(x, target.natural_vector_size<uint8_t>()).parallel(y);
    return output;
}
HALIDE_REGISTER_G2(
    "binary_op",
    binary_op,
    g2::target(),
    g2::generator_param("op") = "xor",
    g2::input("input_name", UInt(8), 2),
    g2::input("value_name", UInt(8)),
    g2::output("output_name", UInt(8), 2)
)

Note that my inclination here is that g2::generator_param initially support only our usual suite of arithmetic types (the scalars that Type can handle), plus std::string. GeneratorParam for G1 has many more cases (e.g. explicit enum mapping, LoopLevel, etc) but I think those use cases are likely to be unimportant here, and/or trivially implemented with just scalars and strings.

Thoughts For Discussion

Supporting ImageParam and Param<> as inputs (and OutputImageParam as a return value) is of course trivially doable (it's basically required to be done under the hood for the code above), but is rarely used in the cases I've seen, so I've left it out of the discussion here. I'm not sure they add any particular new challenges.
Would it also be desirable to provide statically-typed variants of the decorators? e.g.

    g2::input_buffer<uint8_t, 2>("input_name")  // uint8 buffer with 2 dims
    g2::input_scalar<uint8_t>("value_name")     // uint8 input

If we are adding more info in a declarative way, are there more things we should consider adding, at the risk of feature creep? Some things that currently get relegated to "stick in somewhere in the generate() method" include scheduling estimates, constraints on strides/extents for inputs or outputs, etc. (I'm inclined to say "no" here but it's worth thinking about how they might fit in if we wanted them.)

Provide feedback

Saved searches