Skip to content
Marc Claesen edited this page Oct 10, 2013 · 26 revisions

A BasicBlock models some elementary function in a data analysis workflow. EnsembleSVM constructs sophisicated pipelines as a sequence of BasicBlocks. The BasicBlock class is derived from Pipeline, so the same philosophy applies here.

List of currently available BasicBlocks

Element-wise operations

These blocks perform element-wise operations on the input, which may be a vector or scalar.

  1. Scale: f(x) = sx, with s configurable per input.

  2. Offset: f(x) = x+o, with o configurable per input.

  3. Logistic function: f(x) = 1/(1+exp(-x)).

  4. Threshold: f(x) = x > t ? a : b, with a, b and t configurable per input.

Aggregation

These blocks accept vectors and yield a scalar (typically std::vector<double> to double) .

  1. Average (with configurable denominator).

  2. Median.

  3. Sum.

  4. SVM: aggregation using a full-fledged SVMModel. This allows highly nonlinear aggregation.

BasicBlock implementation

The BasicBlock class

Basic blocks can offer a variety of functionality, including element-wise operations on vectors, scalar operations and aggregation operations such as computing the mean. It is important to be aware that when basic blocks are concatenated, the signature of the new functor may be different from both original basic blocks.

This is easy to understand when you remember that a basic block is a function. Specifically, a basic block is templated function for which both return type and argument type are templated. We could, for instance, concatenate something along these lines:

double Bar(std::vector<double> bla);
bool Foo(double bla);

std::vector<double> quux=...;
Foo(Bar(quux));

The concatenated function has result type bool and argument_type std::vector<double>, which is clearly different from both Foo and Bar.

The fact the signature of the functor depends on how blocks are concatenated is the main hurdle in the pipeline framework. It requires extensive template metaprogramming, specifically a new type for every concatenation. To facilitate this for users, we provide factories that make abstraction of the underlying template mechanism.

The BasicBlock class is templated, specifically as follows:

BasicBlock<return_type(argument_type),Internal>

return_type denotes the functor's return type as expected. What may not be as expected is that argument_type is not necessarily the functor's argument. It is the type of this basic block's input. The actual functor's type depends on Internal. Internal can be one of two things:

  1. Another basic block, with all the same rules. This is similar to the decorator pattern (but templated, because types may change).

  2. decltype(nullptr): if this basic block is the front of a pipeline.

Factories

The basic block factories provide the interface using which blocks may be constructed. The factories become particularly important when several blocks are concatenated into a pipeline, due to the complexity of the newly formed type.

In general, you will never need to define a Factory specialization for a new BasicBlock. This is all done automatically: the factories use variadic argument packs and simply refer these to the appropriate constructor for the block.

The main reason factories are used is to shield the user from the complex type of a basic block. For example, suppose we want to concatenate Scale<std::vector<double>(std::vector<double>)> and Sum<double(std::vector<double>)> to obtain a weighted sum. This is implemented as follows:

typedef std::vector<double> Vector
Vector coefficients=; // scale coefficients of appropriate size
Factory<Scale<Vector(Vector)>> factory_scale;
auto scale = f_scale(coefficients);
Factory<Sum<double(Vector)>> factory_sum;
auto sum = f_sum(std::move(scale));

The factories implicitly match the dimensions when blocks are concatenated. For example, it would be illegal to concatenate a block with 5 outputs to a block with 4 inputs. Additionally, thanks to C++11's auto keyword, you need not worry about the type of the blocks.

to illustrate how quickly this can get complicated, we will write the types out in full:

decltype(scale)
std::unique_ptr<Scale<Vector(Vector)>>
decltype(sum)
std::unique_ptr<Sum<double(Vector),Scale<Vector(Vector)>>> 

As you can see, the type becomes increasingly complex when pipelines get longer.

A Note on Inheritence

BasicBlock is derived from Pipeline. Note that the template parameters may differ between BasicBlock and Pipeline.. In between these two is a CRTP class that takes care of a whole series of internal wiring, most importantly the actual concatenation.

The inheritance is as follows (ommitting template pars): Pipeline -> BasicBlock -> BB_CRTP -> YourClass

As a user you need not worry about this. A convenience macro Typedefs(BlockName) is provided which gives you a bunch of useful typedefs, namely:

  1. BlockName::BaseClass: the BasicBlock type this is derived from.

  2. BlockName::CRTPClass: the direct base of the class. This is the CRTP class. You need this in constructors.

  3. BlockName::Input: the input type for this particular basic block (not the total functor).

  4. BlockName::DerivedClass: this class.

  5. BlockName::PipeBase: the Pipeline type this is derived from.

  6. BlockName::result_type: the result type of this basic block and the functor if this block is the final block.

  7. BlockName::argument_type: the argument type of the functor. This is equal to BlockName::Input if this block is the start of a pipeline. Otherwise it might not be.

Make sure to always invoke this macro whenever you declare a new elementary function (e.g. a basicblock) in an area with public visibility. Refer to pipeline/blocks.hpp for a number of examples.

BasicBlock parameters versus Pipeline parameters

Recall that a basic block follows the following template:

template<typename return_type, typename argument_type, typename Internal>
BasicBlock<return_type(argument_type),Internal>

return_type and argument_type denote those of the elementary function modelled by the basic block. When basic blocks are concatenated, a new type is formed. Pipeline has the following form:

template<typename return_type, typename argument_type>
Pipeline<return_type(argument_type)>

Here, return_type and argument_type denote the respective types of the functor. These parameters do not always match those in internal basic blocks, particularly not in the last one of a pipeline. When making a concatenation of B fun1(A a) and C fun2(B b) the resulting functor has signature C concat(A a). The final basic block here (modelling fun2) has return_type == C and argument_type == B but is derived from Pipeline<C(A)>.