Skip to content

Generator Enhancements

Jiawen (Kevin) Chen edited this page Mar 22, 2022 · 6 revisions

As of late October 2016 (https://github.com/halide/Halide/pull/1523), Halide Generators have been enhanced:

  • Improve readability and flexibility of Generators
  • Provide machine-generated Stubs that make it easier for one Generator to use another
  • Make integration with the Autoscheduler easier and more reliable

Note that none of these changes break existing Generators (all existing Generators should work as-is); all existing Generators will continue to work as-is for the foreseeable future.

This document is meant to capture the nature of the changes and describe how to "upgrade" a Generator to use the new enhancements.

Replacing Param<X> with Input<X> (and ImageParam with Input<Func>)

Param<> continues to exist, but Generators can now use a new class, Input<>, instead. For scalar types, these can be considered essentially identical to Param<>, but have a different name for reasons of code clarity, as we'll see later.

Similarly, ImageParam continues to exist, but Generators can instead use a Input<Func>. This is (essentially) like an ImageParam, with the main difference being that it may (or may not) not be backed by an actual buffer, and thus has no defined extents.

Input<Func> input{"input", Float(32), 2};

The equivalent of an ImageParam backed by an actual buffer can be created by an Input<Buffer<T>> like this:

ImageParam input{UInt(8), 2, "input"};

becomes:

Input<Buffer<uint8_t>> input{"input", 2};

This allows you (in comparison to an Input<Func>) to access the width and height of the buffer through input.dim(0).extent() and input.dim(1).extent().

It is an error for a Generator to declare both Input<> and Param<> or ImageParam (i.e.: if you use Input<> you may not use the previous syntax).

Note that Input<> is intended only for use with Generator, and is not intended for use in other Halide code; in particular, it is not intended to replace Param<>, except for inside Generators.

Example:

class SumColumns : Generator<SumColumns> {
  ImageParam input{Float(32), 2, "input"};

  Func build() {
    RDom r(0, input.width());
    Func f;
    Var y;
    f(y) = 0.f;
    f(y) += input(r.x, y);
    return f;
  }
};

becomes

class SumColumns : Generator<SumColumns> {
  Input<Func> input{"input", Float(32), 2};
  Input<int32_t> width{"width"};

  Func build() {
    RDom r(0, width);
    Func f;
    Var y;
    f(y) = 0.f;
    f(y) += input(r.x, y);
    return f;
  }
};

You can optionally make the type and/or dimensions of Input<Func> unspecified, in which case the value is simply inferred from the actual Funcs passed to them. Of course, if you specify an explicit Type or Dimension, we still require the input Func to match, or a compilation error results.

Input<Func> input{ "input", 3 };  // require 3-dimensional Func,
                                  // but leave Type unspecified 

When a Generator using Input<Func> is compiled directly (e.g., using GenGen), the Input<Func> must be concretely specified; if Type and/or Dimensions are unspecified, you can specify them using implicit GeneratorParams with names derived from the Input or Output. (In the example above, input has an implicit GeneratorParam named "input.type" and an implicit GeneratorParam named "input.dim".)

Explicitly Declaring Outputs

All of a Generator's inputs can be determined by introspecting its members, but information about its outputs could previously only be determined by calling its build() method and examining the return value (which may be a Func or a Pipeline).

With this change, a Generator can, instead, explicitly declare its output(s) as member variables, and provide a generate() method instead of a build() method. (These are equivalent aside from the fact that generate() does not return a value.)

Example:

class SumColumns : Generator<SumColumns> {
  Input<Func> input{"input", Float(32), 2};
  Input<int32_t> width{"width"};

  Func build() {
    RDom r(0, width);
    Func f;
    Var y;
    f(y) = 0.f;
    f(y) += input(r.x, y);
    return f;
  }
};

becomes

class SumColumns : Generator<SumColumns> {
  Input<Func> input{"input", Float(32), 2};
  Input<int32_t> width{"width"};

  Output<Func> sum_cols{"sum_cols", Float(32), 1};

  void generate() {
    RDom r(0, width);
    Var y;
    sum_cols(y) = 0.f;
    sum_cols(y) += input(r, y);
  }
};

As with Input<Func>, you can optionally make the type and/or dimensions of an Output<Func> unspecified; any unspecified types must be resolved via an implicit GeneratorParam in order to use top-level compilation.

Note that Output<> is intended only for use with Generator, and is not intended for use in other Halide code.

The Generator infrastructure will verify (after calling generate()) that all outputs are defined, and have definitions that match the declaration.

You can specify an output that returns a Tuple by specifying a list of Types:

class Tupler : Generator<Tupler> {
  Input<Func> input{"input", Int(32), 2};
  Output<Func> output{"output", {Float(32), UInt(8)}, 2};

  void generate() {
    Var x, y;
    output(x, y) = Tuple(cast<float>(input(x, y)), cast<uint8_t>(input(x, y)));
  }
};

A Generator can define multiple outputs (which is quietly implemented as a Pipeline under the hood):

class SumRowsAndColumns : Generator<SumRowsAndColumns> {
  Input<Func> input{"input", Float(32), 2};
  Input<int32_t> width{"width"};
  Input<int32_t> height{"height"};

  Output<Func> sum_rows{"sum_rows", Float(32), 1};
  Output<Func> sum_cols{"sum_cols", Float(32), 1};

  void generate() {
    RDom rc(0, height);
    Var x;
    sum_rows(x) = 0.f;
    sum_rows(x) += input(x, rc);

    RDom rr(0, width);
    Var y;
    sum_cols(y) = 0.f;
    sum_cols(y) += input(rr, y);
  }
};

We also allow you to specify Output for any scalar type (except for Handle types); this is merely syntactic sugar on top of a zero-dimensional Func, but can be quite handy, especially when used with multiple outputs:

class Sum : Generator<Sum> {
  Input<Func> input{"input", Float(32), 2};
  Input<int32_t> width{"width"};
  Input<int32_t> height{"height"};

  Output<Func> sum_rows{"sum_rows", Float(32), 1};
  Output<Func> sum_cols{"sum_cols", Float(32), 1};
  Output<float> sum{"sum"};

  void generate() {
    RDom rc(0, height);
    Var x;
    sum_rows(x) = 0.f;
    sum_rows(x) += input(x, rc);

    RDom rr(0, width);
    Var y;
    sum_cols(y) = 0.f;
    sum_cols(y) += input(rr, y);

    RDom r(0, width, 0, height);
    sum() = 0.f;
    sum() += input(r.x, r.y);
  }
};

Note that it is an error to define both a build() and generate() method.

Array Inputs and Outputs

You can also use the new syntax to declare an array of Input or Output, by using an array type as the type parameter:

// Takes exactly 3 images and outputs exactly 3 sums.
class SumRowsAndColumns : Generator<SumRowsAndColumns> {
  Input<Func[3]> inputs{"inputs", Float(32), 2};
  Input<int32_t[2]> extents{"extents"};
  Output<Func[3]> sums{"sums", Float(32), 1};
  void generate() {
    assert(inputs.size() == sums.size());
    // assume all inputs are same extent
    Expr width = extent[0];
    Expr height = extent[1];
    for (size_t i = 0; i < inputs.size(); ++i) {
      RDom r(0, width, 0, height);
      sums[i]() = 0.f;
      sums[i]() += inputs[i](r.x, r.y);
     }
  }
};

You can also leave array size unspecified, with some caveats:

  • For ahead-of-time compilation, Inputs must have a concrete size specified via a GeneratorParam at build time (e.g., pyramid.size=3)
  • For JIT compilation via a Stub, Inputs array sizes will be inferred from the vector passed.
  • For ahead-of-time compilation, Outputs may specify a concrete size via a GeneratorParam at build time (e.g., pyramid.size=3), or the size can be specified via a resize() method.
class Pyramid : public Generator<Pyramid> {
public:
    GeneratorParam<int32_t> levels{"levels", 10};
    Input<Func> input{ "input", Float(32), 2 };
    Output<Func[]> pyramid{ "pyramid", Float(32), 2 };
    void generate() {
        pyramid.resize(levels);
        pyramid[0](x, y) = input(x, y);
        for (int i = 1; i < pyramid.size(); i++) {
            pyramid[i](x, y) = (pyramid[i-1](2*x, 2*y) +
                               pyramid[i-1](2*x+1, 2*y) +
                               pyramid[i-1](2*x, 2*y+1) +
                               pyramid[i-1](2*x+1, 2*y+1))/4;
        }
    }
};

An Array Input/Output with unspecified size must be resolved to a concrete size for toplevel compilation; there are now implicit GeneratorParam<size_t> that allow to to set this, based on the name ("pyramid.size" in the example above).

Note that both Input and Output arrays support a limited subset of the methods from std::vector<>:

  • operator[]
  • size()
  • begin()
  • end()
  • resize() (Output only)

Separating Scheduling from Building

A Generator can now split the existing build() method into two methods:

void generate() { ... }
void schedule() { ... }

Such a Generator must move all scheduling code for intermediate Func into the schedule() method. Note that this means that schedulable Func, Var, etc will need to be stored as member variables of the Generator. (Since Output<> are required to be declared as member variables, these are simple enough, but intermediate Func that need scheduling may require motion.)

Example:

class Example : Generator<Example> {
  Output<Func> output{"output", Float(32), 2};

  void generate() {
    Var x, y;

    Func intermediate;
    intermediate(x, y) = SomeExpr(x, y);

    output(x, y) = intermediate(x, y);

    intermediate.compute_at(output, y);
  }
};

becomes

class Example : Generator<Example> {
  Output<Func> output{"output", Float(32), 2};

  void generate() {
    intermediate(x, y) = SomeExpr(x, y);
    output(x, y) = intermediate(x, y);
  }

  void schedule() {
    intermediate.compute_at(output, y);
  }

  Func intermediate;
  Var x, y;
};

Note that the output Func doesn't have a scheduling directive for compute_at() or store_at() in either case: it is either implicitly compute_root() (when being compiled directly into a filter), or explicitly scheduled by its caller (when being used as a subcomponent, as we'll see later).

Even if the intermediate Halide code doesn't have any scheduling necessary (e.g. it's all inline), you should still provide an empty schedule() method to make this fact obvious and clear.

Example:

class ExampleInline : Generator<ExampleInline> {
  Output<Func> output{"output", Float(32), 2};

  void generate() {
    Var x, y;
    output(x, y) = SomeExpr(x, y);
  }
};

becomes

class ExampleInline : Generator<ExampleInline> {
  Output<Func> output{"output", Float(32), 2};

  void generate() {
    output(x, y) = SomeExpr(x, y);
  }

  void schedule() {
    // empty
  }

  Var x, y;
};

Converting GeneratorParam into ScheduleParam where necessary

GeneratorParam is now augmented by the new ScheduleParam type. All generator params that are intended to be used by the schedule() method should be declared as ScheduleParam rather than GeneratorParam. This has two purposes:

  • It allows a declarative way to enumerate and communicate scheduling information between arbitrary Generators (as we'll see later).
  • It makes clear which GeneratorParams are used for scheduling, which will aid future Autoscheduler work.

Note that there are common GeneratorParam conventions that already act as ScheduleParam (most notably, vectorize and parallelize); this merely formalizes the previous convention.

GeneratorParam and ScheduleParam continue to live inside a single namespace (i.e., it is an error to declare a GeneratorParam and ScheduleParam with the same name).

While a GeneratorParam can be used from anywhere inside a Generator (either the generate() or schedule() method), a ScheduleParam should be accessed only within the schedule() method. (We'd like to make this a compile-time error in the future.)

Note that while GeneratorParam continues to be serializable to and from strings (just as GeneratorParams are), some ScheduleParam values are not serializable, as they may reference runtime-only Halide structures (most notably, LoopLevel, which cannot be reliably specified by name in the general case). Attempting to set such a ScheduleParam from GenGen will cause a compile-time error.

Example:

class Example : Generator<Example> {
  GeneratorParam<int32_t> iters{"iters", 10};
  GeneratorParam<bool> vectorize{"vectorize", true};

  Func generate() {
    Var x, y;
    vector<Func> intermediates;
    for (int i = 0; i < iters; ++i) {
      Func g;
      g(x, y) = (i == 0) ? SomeExpr(x, y) : SomeExpr2(g(x, y));
      intermediates.push_back(g);
    }
    Func f;
    f(x, y) = intermediates.back()(x, y);

    // Schedule
    for (auto fi : intermediates) {
      fi.compute_at(f, y);
      if (vectorize) fi.vectorize(x, natural_vector_size<float>());
    }
    return f;
  }
};

becomes

class Example : Generator<Example> {
  GeneratorParam<int32_t> iters{"iters", 10};
  ScheduleParam<bool> vectorize{"vectorize", true};

  Output<Func> output{"output", Float(32), 2};

  void generate() {
    for (int i = 0; i < iters; ++i) {
      Func g;
      g(x, y) = (i == 0) ? SomeExpr(x, y) : SomeExpr2(g(x, y));
      intermediates.push_back(g);
    }
    output(x, y) = intermediates.back()(x, y);
  }

  void schedule() {
    for (auto fi : intermediates) {
      fi.compute_at(output, y);
      if (vectorize) fi.vectorize(x, natural_vector_size<float>());
    }
  }

  Var x, y;
  vector<Func> intermediates;
};

Note that ScheduleParam can have other interesting values too, most notably LoopLevel:

class Example : Generator<Example> {
  // Specify a LoopLevel at which we want intermediate Func(s)
  // to be computed and/or stored.
  ScheduleParam<LoopLevel> intermediate_compute_level{"level", "undefined"};
  ScheduleParam<LoopLevel> intermediate_store_level{"level", "root"};
  Output<Func> output{"output", Float(32), 2};

  void generate() {
    intermediate(x, y) = SomeExpr(x, y);
    output(x, y) = intermediate(x, y);
  }

  void schedule() {
    intermediate
      // If intermediate_compute_level is undefined,
      // default to computing at output's rows
      .compute_at(intermediate_compute_level.defined() ?
                  intermediate_compute_level :
                  LoopLevel(output, y))
      .store_at(intermediate_store_level);
  }

  Func intermediate;
  Var x, y;
};

Note that ScheduleParam<LoopLevel> can default to "root", "inline", or "undefined"; all other values (e.g. Func-and-Var) must be specified in actual code. (It is explicitly not possible to specify LoopLevel(Func, Var) by name, e.g. "func.var"; although Halide uses such a convention internally, it is not currently possible to guarantee unique Func names across an arbitrary set of Generators.)

Note that it is an error to use an undefined LoopLevel for scheduling.

Revised RegisterGenerator Syntax

Previously, you'd register a Generator by explicitly instantiating a RegisterGenerator at global scope:

Halide::RegisterGenerator<MyGen> register_my_gen{"my_gen"};

This still works, but we're introducing a simpler registration macro:

HALIDE_REGISTER_GENERATOR(MyGen, my_gen)  // no semicolon at end

If you want to generate a Stub for your Generator, you must use the new-style registration macro, and add that information to the declaration:

// We must forward-declare the name we want for the stub, 
// inside the proper namespace(s). None of the namespace(s) 
// may be anonymous (if they are, failures will occur at Halide
// compilation time).
namespace SomeNamespace { class MyGenStub; }
HALIDE_REGISTER_GENERATOR(MyGen, "my_gen", SomeNamespace::MyGenStub)

If the fully-qualified stub name specified for third argument hasn't been declared properly, a compile error will result. The fully-qualified name must have at least one namespace (i.e., a name at global scope is not acceptable).

Generator Stubs

Let's start with an example of usage, then work backwards to explain what's going on. Say we have an RGB-to-YCbCr component we want to re-use:

class RgbToYCbCr : public Generator<RgbToYCbCr> {
  Input<Func> input{"input", Float(32), 3};
  Output<Func> output{"output", Float(32), 3};
  void generate() { ... conversion code here ... }
  void schedule() { ... scheduling code here ... }
};
RegisterGenerator<RgbToYCbCr> register_me{"rgb_to_ycbcr"};

GenGen now can produce a "Func-like" stub class around a generator, which (by convention) is emitted in a file with the extension ".stub.h". It looks something like:

/path/to/rgb_to_rcbcr.stub.h:

  // MACHINE-GENERATED
  class RgbToYCbCr : public GeneratorStub {
    struct Inputs { 
       // All the Input<>s declared in the Generator are listed here,
       // as either Func or Expr
       Func input;
    };
    struct GeneratorParams { ... };
    struct ScheduleParams { ... };

    // ctor, with required inputs, and (optional) GeneratorParams.
    RgbToYCbCr(GeneratorContext* context,
               const Inputs& inputs,
               const GeneratorParams& = {}) { ... }

    // Output(s)
    Func output;

    // Overloads for first output
    operator Func() const { return output; }
    Expr operator()(Expr x, Expr y, Expr z) const  { return output(x, y, z); }
    Expr operator()(std::vector<Expr> args) const  { return output(args); }
    Expr operator()(std::vector<Var> args) const  { return output(args); }

    void schedule(const ScheduleParams &params = {});
  };

Note that this is a "header-only" class; all methods are inlined (or template-multilinked, etc) so there is no associated .cpp to incorporate. Also note that this is a "by-value", internally-handled-based class, like most other types in Halide (e.g. Func, Expr, etc).

We'd consume this downstream like so:

#include "/path/to/rgb_to_rcbcr.stub.h"

class AwesomeFilter : public Generator<AwesomeFilter> {
 public:
  Input<Func> input{"input", Float(32), 3};
  Output<Func> output{"output", Float(32), 3};

  void generate() {
    // Snap image into buckets while still in RGB.
    quantized(x, y, c) = Quantize(input(x, y, c));

    // Convert to YCbCr.
    rgb_to_ycbcr = RgbToYCbCr(this, {quantized});

    // Do something awesome with it. Note that rgb_to_ycbcr autoconverts to a Func.
    output(x, y, c) = SomethingAwesome(rgb_to_ycbcr(x, y, c));
  }
  void schedule() {
    // explicitly schedule the intermediate Funcs we used
    // (including any reusable Generators).
    quantized.
      .vectorize(x, natural_vector_size<float>())
      .compute_at(rgb_to_ycbcr, y);
    rgb_to_ycbcr
      .vectorize(x, natural_vector_size<float>())
      .compute_at(output, y);

    // *Also* call the schedule method for all reusable Generators we used,
    // so that they can schedule their own intermediate results as needed.
    // (Note that we may have to pass them appropriate values for ScheduleParam,
    // which vary from Generator to Generator; since RgbToYCbCr has none,
    // we don't need to pass any.)
    rgb_to_ycbcr.schedule();
 }

 private:
  Var x, y, c;
  Func quantized;
  RgbToYCbCr rgb_to_ycbcr;

  Expr Quantize(Expr e) { ... }
  Expr SomethingAwesome(Expr e) { ... }
};

It's worth pointing out that all inputs to the subcomponent must be explicitly provided when the subcomponent is created (as arguments to its ctor); the caller is responsible for providing these. (There is no concept of automatic input forwarding from the caller to a subcomponent.)

What if RgbToYCbCr has array inputs or outputs? For instance:

class RgbToYCbCrMulti : public Generator<RgbToYCbCrMulti> {
  Input<Func[3]> inputs{"inputs", Float(32), 3};
  Input<float> coefficients{"coefficients", 1.f};
  Output<Func[3]> outputs{"outputs", Float(32), 3};
  ...
};

In that case, the generated RgbToYCbCrMulti class requires vector-of-Func (or vector-of-Expr) for inputs, and provides vector-of-Func as output members:

class RgbToYCbCrMulti : public GeneratorStub {
    struct Inputs { 
       std::vector<Func> inputs;
       std::vector<Expr> coefficients;
    };
    RgbToYCbCr(GeneratorContext* context,
               const Inputs& inputs,
               const GeneratorParams& = {}}) { ... }

    ...

    std::vector<Func> outputs;
};

What if RgbToYCbCr has multiple outputs? For instance:

class RgbToYCbCrMulti : public Generator<RgbToYCbCrMulti> {
  Input<Func> input{"input", Float(32), 3};
  Output<Func> output{"output", Float(32), 3};
  Output<Func> mask{"mask", UInt(8), 2};
  Output<float> score{"score"};
  ...
};

In that case, the generated RgbToYCbCrMulti class has all outputs as struct members, with names that match the declared names in the Generator:

struct RgbToYCbCrMulti {
    ...
    Func output;
    Func mask;
    Func score;
};

Note that scalar outputs are still represented as (zero-dimensional) functions, for consistency. (Also note that "output" isn't a magic name; it just happens to be the name of the first output of this Generator.)

Note also that the first output is always represented both in an "is-a" relationship and a "has-a" relationship: RgbToYCbCrMulti overloads the necessary operators so that accessing it as a Func is the same as accessing its "output" field, i.e.:

struct RgbToYCbCrMulti {
    ...
    Func output;

    operator Func() const { return output; }
    Expr operator()(Expr x, Expr y, Expr z) const  { return output(x, y, z); }
    Expr operator()(std::vector<Expr> args) const  { return output(args); }
    Expr operator()(std::vector<Var> args) const  { return output(args); }
    ...
};

This is (admittedly) redundant, but is deliberate: it allows convenience for the most common case (a single output), but also orthogonality in the multi-output case.

The consumer might use this like so:

#include "/path/to/rgb_to_rcbcr_multi.stub.h"

class AwesomeFilter : public Generator<AwesomeFilter> {
  ...
  void generate() {
    rgb_to_ycbcr_multi = RgbToYCbCrMulti(this, {input});
    output(x, y, c) = SomethingAwesome(rgb_to_ycbcr_multi.output(x, y, c),
                                       rgb_to_ycbcr_multi.mask(x, y),
                                       rgb_to_ycbcr_multi.score());
  }
  void schedule() {
    rgb_to_ycbcr_multi.output
      .vectorize(x, natural_vector_size<float>())
      .compute_at(output, y);
    rgb_to_ycbcr_multi.mask
      .vectorize(x, natural_vector_size<float>())
      .compute_at(output, y);
    rgb_to_ycbcr_multi.score
      .compute_root();
    // Don't forget to call the schedule() function.
    rgb_to_ycbcr_multi.schedule();
  }
};

What if there were GeneratorParam we wanted to set in RgbToYCbCr, to configure code generation? In that case, we'd pass a value for the optional generator_params field when calling its constructor

class RgbToYCbCr : public Generator<RgbToYCbCr> {
  GeneratorParam<Type> input_type{"input_type", UInt(8)};
  GeneratorParam<bool> fast_but_less_accurate{"fast_but_less_accurate", false};
  ...
};

This would produce a different (generated) definition of GeneratorParams, with a field for each GeneratorParam, initialized to the proper default:

struct GeneratorParams {
  Halide::Type input_type{UInt(8)};
  bool fast_but_less_accurate{false};
};

We could then fill this in manually:

class AwesomeFilter : public Generator<AwesomeFilter> {
  void generate() {
    ...
    GeneratorParams generator_params;
    generator_params.input_type = Float(32);
    generator_params.fast_but_less_accurate = true;
    rgb_to_ycbcr = RgbToYCbCr(this, input, generator_params);
    ...
  }
}

Alternately, if we know the types at C++ compilation time, we can use a templated construction method that is terser:

class AwesomeFilter : public Generator<AwesomeFilter> {
  void generate() {
    ...
    rgb_to_ycbcr = RgbToYCbCr::make<float, true>(this, input);
    ...
  }
}

What if there are ScheduleParam in RgbToYCbCr?

class RgbToYCbCr : public Generator<RgbToYCbCr> {
  ScheduleParam<LoopLevel> level{"level"};
  ScheduleParam<bool> vectorize{"vectorize"};

  void generate() {
    intermediate(x, y) = SomeExpr(x, y);
    output(x, y) = intermediate(x, y);
  }

  void schedule() {
    intermediate.compute_at(level);
    if (vectorize) intermediate.vectorize(x, natural_vector_width<float>());
  }

  Var x, y;
  Func intermediate;
};

In that case, the generated stub code would have a different declaration for ScheduleParams:

struct ScheduleParams {
  LoopLevel level{"undefined"};
  bool vectorize{false};
};

And we might call it like so:

class AwesomeFilter : public Generator<AwesomeFilter> {
  ...
  void schedule() {
    rgb_to_ycbcr
      .vectorize(x, natural_vector_size<float>())
      .compute_at(output, y);

    rgb_to_ycbcr.schedule({
      // We want any intermediate products also at compute_at(output, y)
      LoopLevel(output, y),
      // vectorization: yes please
      true
    });
  }
  ...
}