BasicBlock
A BasicBlock
models some elementary function in a data analysis workflow. EnsembleSVM constructs sophisicated pipelines as a sequence of BasicBlocks. The BasicBlock
class is derived from Pipeline
, so the same philosophy applies here.
These blocks perform element-wise operations on the input, which may be a vector or scalar.
-
Scale:
f(x) = sx
, withs
configurable per input. -
Offset:
f(x) = x+o
, witho
configurable per input. -
Logistic function:
f(x) = 1/(1+exp(-x))
. -
Threshold:
f(x) = x > t ? a : b
, witha
,b
andt
configurable per input.
These blocks accept vectors and yield a scalar (typically std::vector<double>
to double
) .
-
Average (with configurable denominator).
-
Median.
-
Sum.
-
SVM: aggregation using a full-fledged
SVMModel
. This allows highly nonlinear aggregation.
Basic blocks can offer a variety of functionality, including element-wise operations on vectors, scalar operations and aggregation operations such as computing the mean. It is important to be aware that when basic blocks are concatenated, the signature of the new functor may be different from both original basic blocks.
This is easy to understand when you remember that a basic block is a function. Specifically, a basic block is templated function for which both return type and argument type are templated. We could, for instance, concatenate something along these lines:
double Bar(std::vector<double> bla);
bool Foo(double bla);
std::vector<double> quux=...;
Foo(Bar(quux));
The concatenated function has result type bool
and argument_type std::vector<double>
, which is clearly different from both Foo
and Bar
.
The fact the signature of the functor depends on how blocks are concatenated is the main hurdle in the pipeline framework. It requires extensive template metaprogramming, specifically a new type for every concatenation. To facilitate this for users, we provide factories that make abstraction of the underlying template mechanism.
The BasicBlock
class is templated, specifically as follows:
BasicBlock<return_type(argument_type),Internal>
return_type
denotes the functor's return type as expected. What may not be as expected is that argument_type
is not necessarily the functor's argument. It is the type of this basic block's input. The actual functor's type depends on Internal
. Internal
can be one of two things:
-
Another basic block, with all the same rules. This is similar to the decorator pattern (but templated, because types may change).
-
decltype(nullptr)
: if this basic block is the front of a pipeline.
The basic block factories provide the interface using which blocks may be constructed. The factories become particularly important when several blocks are concatenated into a pipeline, due to the complexity of the newly formed type.
In general, you will never need to define a Factory
specialization for a new BasicBlock
. This is all done automatically: the factories use variadic argument packs and simply refer these to the appropriate constructor for the block.
The main reason factories are used is to shield the user from the complex type of a basic block. For example, suppose we want to concatenate Scale<std::vector<double>(std::vector<double>)>
and Sum<double(std::vector<double>)>
to obtain a weighted sum. This is implemented as follows:
typedef std::vector<double> Vector
Vector coefficients=; // scale coefficients of appropriate size
Factory<Scale<Vector(Vector)>> factory_scale;
auto scale = f_scale(coefficients);
Factory<Sum<double(Vector)>> factory_sum;
auto sum = f_sum(std::move(scale));
The factories implicitly match the dimensions when blocks are concatenated. For example, it would be illegal to concatenate a block with 5 outputs to a block with 4 inputs. Additionally, thanks to C++11's auto
keyword, you need not worry about the type of the blocks.
to illustrate how quickly this can get complicated, we will write the types out in full:
decltype(scale)
std::unique_ptr<Scale<Vector(Vector)>>
decltype(sum)
std::unique_ptr<Sum<double(Vector),Scale<Vector(Vector)>>>
As you can see, the type becomes increasingly complex when pipelines get longer.
BasicBlock
is derived from Pipeline
. Note that the template parameters may differ between BasicBlock and Pipeline.. In between these two is a CRTP class that takes care of a whole series of internal wiring, most importantly the actual concatenation.
The inheritance is as follows (ommitting template pars):
Pipeline
-> BasicBlock
-> BB_CRTP
-> YourClass
As a user you need not worry about this. A convenience macro Typedefs(BlockName)
is provided which gives you a bunch of useful typedefs
, namely:
-
BlockName::BaseClass
: theBasicBlock
type this is derived from. -
BlockName::CRTPClass
: the direct base of the class. This is the CRTP class. You need this in constructors. -
BlockName::Input
: the input type for this particular basic block (not the total functor). -
BlockName::DerivedClass
: this class. -
BlockName::PipeBase
: thePipeline
type this is derived from. -
BlockName::result_type
: the result type of this basic block and the functor if this block is the final block. -
BlockName::argument_type
: the argument type of the functor. This is equal toBlockName::Input
if this block is the start of a pipeline. Otherwise it might not be.
Make sure to always invoke this macro whenever you declare a new elementary function (e.g. a basicblock) in an area with public visibility. Refer to pipeline/blocks.hpp
for a number of examples.
Recall that a basic block follows the following template:
template<typename return_type, typename argument_type, typename Internal>
BasicBlock<return_type(argument_type),Internal>
return_type
and argument_type
denote those of the elementary function modelled by the basic block. When basic blocks are concatenated, a new type is formed. Pipeline has the following form:
template<typename return_type, typename argument_type>
Pipeline<return_type(argument_type)>
Here, return_type
and argument_type
denote the respective types of the functor. These parameters do not always match those in internal basic blocks, particularly not in the last one of a pipeline. When making a concatenation of B fun1(A a)
and C fun2(B b)
the resulting functor has signature C concat(A a)
. The final basic block here (modelling fun2) has return_type == C
and argument_type == B
but is derived from Pipeline<C(A)>
.