Skip to content
/ bleak Public

A C++ implementation of neural networks as directed acyclic graphs.

Notifications You must be signed in to change notification settings

nslay/bleak

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disclaimer

This project serves as both an intellectual exercise to understand the very low level details of deep learning as well as a sandbox to test crazy ideas that might be harder to test in more mainstream toolkits! You should probably look at more mainstream toolkits like tensorflow or pytorch.

Introduction

This project started as a framework to implement Random Hinge Forest which is detailed in this arXiv draft

https://arxiv.org/abs/1802.03882

For benchmark experiments in this repository, Random Hinge Forest serves as both a standalone learning machine as well as a non-linearity for consecutive layers. So you will not find a conventional activation function in this neural network toolkit (at least as of this revision, but it wouldn't be hard to add!).

NOTE: You can find a PyTorch port of RandomHingeForest here:

https://github.com/nslay/HingeTreeForTorch

Tested Environments

Bleak has been developed and/or tested in the following environments

  • Windows 10, Visual Studio 2017, OpenBLAS 0.3.6, CUDA 10.2, cuDNN 7.6.5, LMDB 0.9.70, ITK 4.13
    • Uses Windows Subsystem for Linux for experiments.
    • GeForce GTX 980
  • FreeBSD 12.1-STABLE, clang-10.0.0, OpenBLAS 0.3.9, LMDB 0.9.70, ITK 4.13
    • No GPU support on this Unix-like operating system. I don't have a spare computer to test on Linux!

Compiling from Source

To build bleak, you will need the following dependencies

  • A C++14 compiler (GCC, Clang or Visual Studio 2017 or later)
  • cmake 3.10 or later (ccmake recommended on Unix-like systems)

First clone this repository and its submodules

git clone https://github.com/nslay/bleak
cd bleak
git submodule init
git submodule update

Create a separate empty folder (call it build) and

Unix-like Systems

mkdir build
cd build
ccmake /path/to/bleak

Press 'c' to configure, select desired build options and modules (press 'c' again for any changes) and then finally press 'g' to generate the Makefiles to build bleak.

NOTE: Bleak should build and run on Unix-like systems (I occassionally compile and run it on FreeBSD). That said, the experiment shell scripts were written for Windows Subsystem for Linux. So some script modification is likely needed to run experiments on actual Unix-like systems.

Windows

Run cmake-gui and set the source code and build folders. For example C:/Work/Source/bleak and C:/Work/Build/bleak respectively.

Press "Configure", select the desired build options and modules (press "Configure" for any changes) and then finally press "Generate". You can also press "Open Project" to launch Visual Studio automatically.

NOTE: Make sure to select the "Release" build mode in Visual Studio.

Some General Options

  • bleakUseOpenMP -- Try to enable OpenMP support in the compiler (if available).
  • bleakUseCUDA -- Try to enable CUDA support (if available).
  • bleakBLASType -- "slowblas" (default, built-in to bleak and very slow!) or "openblas" (OpenBLAS).

Modules

  • bleakCommon -- A required module that is essentially the glue of all of bleak (Graph, Vertex, Array, BLAS wrappers, parsers, databases, etc...) and some optimizers (SGD, AdaGrad, Adam) and some basic Vertices (InnerProduct, BatchNormalization, SoftmaxLoss, etc...).
  • bleakImage -- Gemm-based convolution and pooling.
  • bleakTrees -- Random hinge forest, ferns, covnolutional hinge trees and ferns, feature selection and annealing.
  • bleakITK -- ITK 1D/2D/3D image loader Vertex (supports PNG/JPEG, DICOM, MetaIO, Nifti, etc...). Requires ITK 4+.
  • bleakCudnn -- cuDNN-based convolution and pooling. Requires cuDNN.

Graphs and Vertices

In bleak, neural network computation is implemented as a directed graph. Vertices implement the forward/backward operations and have names, properties, and named inputs and outputs. This enables searching for vertices by name, assigning values to named properties as well as querying inputs and outputs by name. Edges serve to store tensor inputs and outputs and their gradients. Vertices uniquely own Edges for their outputs while being assigned Edges for their inputs. Graphs in bleak can be constructed/modified in C++ or can be read from a .sad file.

Basic Graph Syntax

A .sad file follows this general format. Sections denoted with [] are optional.

  1. [Variable Declarations]
  2. [Subgraph Declarations]
  3. Vertex Declarations
  4. [Connection Declarations]

Whitespace is ignored and all declarations are terminated with a semicolon (;) (except for includes). A file can be included at any time with an include statement. For example

include "Config.sad"

This included file is treated as if its content were copied and pasted in place of the include. The included file by itself need not be a valid graph.

Comments are preceded by the octothorpe symbol (#). For example

# This is a comment.

They may occur anywhere outside of a string value.

Variable Declarations

Variables are declared as a key value pair. For example

batchSize = 16;
learningRateMultiplier=1.0;
imageList = "alcoholicTrainList.txt";

And they may be overwritten by subsequent declarations. For example

include "Config.sad"
batchSize=32; # Override config file

Variables in .sad files support a small collection of basic types:

  • integer
  • float
  • boolean (true/false)
  • string ("value")
  • integer vector ([8, 3, 256, 256])
  • float vector ([0.5, 1.0])

Expressions with Variables

Variables can be referenced in a synonymous fashion as shell variables (with '$') and may be used in simple mathematical expressions if they are float or integer types. The mathematical operators available include +, -, *, /, % (modulo), ^ (exponentiation) and ** (exponentiation). Resulting types follow the behavior of the C/C++ programming languages. For example, 1/2 results in 0 while 1.0/2 results in 0.5. The addition operator (+) may also be used to concatenate strings. Here are some examples

# This expression results in an integer (features3Width is an integer)
pool1Width = ($features3Width - 2)/2 + 1; 

# This concatenates two strings
imageList=$dataRoot + "/SMNI_CMI_TRAIN/alcoholicTrainList.txt"; 

# Variables and expressions can even be used inside of vectors
size = [ $numTrees, 2^$treeDepth - 1 ]; 

There are currently no built-in functions like min/max/exp or any syntax to reference vector components.

Subgraph Declarations

Subgraphs are declared immediately after variables (if any). They recursively define graphs which follow the structure mentioned above with some additional mechanisms to facilitate communicating properties and setting up connections. This topic will be covered in detail in section Subgraphs after vertex declarations and connection declarations are covered.

Vertex Declarations

After variables and subgraphs are declared (if any), then vertices are declared. Vertices have a type name, named properties and a unique name that refers to that instance of the vertex. They are declared in a manner as follows

VertexType {
  propertyName=propertValue;
  propertyName2=propertyValue2;
  # And so forth...
} uniqueVertexName;

If a vertex requires no properties, one may simply declare

VertexType uniqueVertexName; 

Vertex types are either provided by modules (compiled into bleak) or are instances of subgraphs (discussed in Subgraphs). Some examples of vertices will be described later.

Vertex properties are used to communicate runtime settings to the Vertex. This may be information about the size of a convolution kernel or the stride or dilation of a convolution operation. Importantly, Vertex properties are not variables. They may not reference themselves and cannot be declared unexpected. Variables and expressions may be used in Vertex properties (which is the whole intention of variables!). For example

numTrees=100;
treeDepth = 7;
applyWeightDecay = false;

Parameters {
  size = [ $numTrees, 2^$treeDepth - 1 ];
  learnable=true;
  initType="uniform";
  applyWeightDecay=$applyWeightDecay;
  b = 3;
  a = -$b; # ERROR: Properties are not variables.
  giraffe = "Not a property"; # ERROR: giraffe is not a Parameters property.
} thresholds;

Vertex properties afford a bit of flexibility in value types. Many types of values are implicitly convertible. For example

Parameters {
  size = 10; # Integer convertible to one component integer vector [ 10 ].
  learnable = 1; # Integer convertible to boolean.
  a="-3.0"; # String representation of a float is convertible to a float.
  b=[ 3 ]; # One component integer vector is convertible to a float.
} tensor;

Any type is convertible to a string and any string is (possibly) convertible to any type. Other implicit conversions are provided below.

  • integer -> float
  • integer -> boolean
  • integer -> integer vector
  • integer -> float vector
  • float -> boolean
  • float -> float vector
  • boolean -> integer
  • boolean -> float
  • boolean -> integer vector
  • boolean -> float vector
  • integer vector -> integer (only if the vector has 1 component)
  • integer vector -> float (only if the vector has 1 component)
  • float vector -> float (only if the vector has 1 component)

How vertices are compiled into bleak and given named properties and named inputs/outputs will be discussed in section Implementing your own Vertex in C++.

Connection Declarations

After all vertices have been declared, they can be connected by using their unique name and a named input or output. A connection takes one of two possible forms

VertexType1 source;
VertexType2 target;

source.outputName -> target.inputName;
target.inputName <- source.outputName;

Like properties, named inputs and outputs are compiled into bleak. This detail will be discussed in section Implementing your own Vertex in C++.

Subgraphs

The declarative nature of this .sad graph syntax can be cumbersome especially since neural networks tend to have repeated structure (e.g. lots of layers of convolution). Subgraphs attempt to reduce the pain of defining neural network architectures by enabling an author to define a standalone repeated component. A subgraph recursively defines a graph with the same structure and syntax as descibed in all sections following section Basic Graph Syntax. They are wrapped in a subgraph directive of the form

subgraph NameOfSubgraph {
  # Graph as described in all sections leading up to this example!
};

Where "NameOfSubgraph" behaves like a type of vertex that can be declared. The variables section of a graph defines the properties of a subgraph. External connections to the subgraph can be communicated through the this keyword which refers to the instance of the subgraph itself. To better understand why this is immensely helpful, imagine the InnerProduct operation (i.e. fully connected layer). The InnerProduct includes learnable weights and bias which are used to calculate W*X + T where X is the $batchSize set of $numInputs-dimensional vectors, W is the $numOutputs set of weights, and T is the $numOutputs set of biases. So a subgraph incorpating all of these elements might look like the following

subgraph SGInnerProduct {
  # Properties with default values
  numInputs=10;
  numOutputs=100;
  
  Parameters {
    size = [ $numOutputs, $numInputs ];
    initType="gaussian";
    learnable = true;
    mu=0.0;
    sigma=1.0;
  } weights;
  
  Parameters {
    size = [ $numOutputs ];
    learnable=true;
  } bias;
  
  InnerProduct innerProduct;
  
  weights.outData -> innerProduct.inWeights;
  bias.outData -> innerProduct.inBias;
  
  # Set up external connections
  this.inData -> innerProduct.inData;
  innerProduct.outData -> this.outData;
};

NOTE: This is a simplified explanation of InnerProduct in bleak. It can handle more than 2D tensors!

Notice that the input and output names can be arbitrarily chosen by the subgraph author through this.name. Now, I can use SGInnerProduct as a kind of vertex type. For example, I might define a logistic regressor training graph for the iris data set as follows

batchSize=16;
numFeatures = 4;
numClasses = 3;

# We can hide the subgraph declaration in another file!
include "SGInnerProduct.sad"

# Incrementally read a prepared CSV file and wrap around
CsvReader {
  batchSize=$batchSize;
  csvFileName="train.csv";
  labelColumn=4;
  shuffle=true;
} csv;

SGInnerProduct {
  numInputs=$numFeatures;
  numOutputs=$numClasses;
} inner;

SoftmaxLoss loss;

csv.outData -> inner.inData;
inner.outData -> loss.inData;
csv.outLabels -> loss.inLabels;

While this is a simple example, it should be clear that subgraphs can considerably reduce the burden of defining graphs with repeated structures. An author need not explicitly declare Parameters for every single operation.

One other nicety of subgraphs is that it can be used to embed a neural network architecture into training, validation, testing and production graphs without modifying the original architecture. An author need only write the architecture as a subgraph in its own standalone .sad file. Then each task-specific graph can include and use the architecture without modification. The simple iris model might instead be defined as

subgraph SGModel {
  numFeatures=4;
  numClasses=3;
  
  include "SGInnerProduct.sad"
  
  SGInnerProduct {
    numInputs=$numFeatures;
    numOutputs=$numClasses;
  } inner;
  
  this.inData -> inner.inData;
  inner.outData -> this.outData;
};

Then the training and production graphs might look like below

Train.sad

# These might be better in a config file (Config.sad?)
batchSize=16;
numFeatures=4;
numClasses=3;

include "SGModel.sad"

# Incrementally read a prepared CSV file and wrap around
CsvReader {
  batchSize=$batchSize;
  csvFileName="train.csv";
  labelColumn=4;
  shuffle=true;
} csv;

SGModel {
  numFeatures=$numFeatures;
  numClasses=$numClasses;
} graph;

SoftmaxLoss loss;

csv.outData -> graph.inData;
graph.outData -> loss.inData;
csv.outLabels -> loss.inLabels;

Production.sad

numFeatures=4;
numClasses=3;

include "SGModel.sad"

# Input placeholder for C++ code... batch size = 1
Input { 
  size = [ 1, $numFeatures ];
} input;

SGModel {
  numFeatures=$numFeatures;
  numClasses=$numClasses;
} graph;

Softmax output;

input.outData -> graph.inData;
graph.outData -> output.inData;

Private Variables

Since the variables of subgraphs work like vertex properties, these variables must be resolved in advance to determine their type. This can lead to some confusing behavior best illustrated in this example

subgraph SGModel {
  inputWidth = 100;
  kernelWidth = 5;
  stride = 1;
  dilate = 1;
  padding = 0;
  outputWidth = ($inputWidth - $kernelWidth - ($kernelWidth - 1)*($dilate - 1) + 2*$padding)/$stride + 1;
  
  # The rest of the subgraph below...
};

SGModel {
  inputWidth=128;
  stride=2;
  kernelWidth=3;
  padding = 1;
  # What do you suppose outputWidth equals? It's still 96 even though the author intended it to be 64!
} graph;

# The rest of the graph below

The variable outputWidth in SGModel is immediately resolved to its default value of 96 regardless of what the author intended! It's important to note that only the variable declarations in subgraphs are initially resolved. Only after a subgraph is declared as a vertex and is assigned its properties do all other expressions with variables become resolved in that instance of the subgraph. Worse yet is that the variable outputWidth is exposed as a property allowing an author to mistakenly assign it an incorrect value! For example

SGModel {
  inputWidth=128;
  stride=2;
  kernelWidth=3;
  padding = 1;
  outputWidth=128; # This would be true if stride=1... this is incorrect!
} graph;

To solve both problems, bleak supports private variables that are excluded from vertex properties (so their type need not be known) while also delaying their resolution until after the vertex declaration and property assignments. A private variable is declared with the private keyword. We can fix the first example by making outputWidth private

subgraph SGModel {
  inputWidth = 100;
  kernelWidth = 5;
  stride = 1;
  dilate = 1;
  padding = 0;
  private outputWidth = ($inputWidth - $kernelWidth - ($kernelWidth - 1)*($dilate - 1) + 2*$padding)/$stride + 1;
  
  # The rest of the subgraph below...
};

SGModel {
  inputWidth=128;
  stride=2;
  kernelWidth=3;
  padding = 1;
  # What do you suppose outputWidth equals? Now it's 64!
  # outputWidth=128; # ERROR: Not a property.
} graph;

# The rest of the graph below

And this gives the intended behavior. However, there are some rules for the use of private variables and these are given below

  1. Global variables cannot reference private variables in expressions. Private variables may, however, reference global variables.
  2. Private variables cannot be redeclared as global (without private qualifier) nor can global variables be redeclared as private.
  3. While normal variables are global to all the contained subgraphs, private variables are only local to their immediate graph.

Graph Evaluation

When a graph is constructed and initialized, a plan is created for the order of evaluation of the vertices. The plan is simply a topologically sorted list of the vertices so that all vertices can run in order with data dependencies implicitly satisfied. A forward pass in the network is simply a for-loop over the plan (invoking Forward()). A backward pass in the network is a for-loop over the plan in reverse (invoking Backward()). Conceptually, the plan evaluates root vertices first and ends with leaf vertices.

Root and Leaf Vertices

A Vertex in a bleak Graph is considered a root vertex if it has no input edges with a source Vertex. For example, a root vertex may simply have no inputs, no connected inputs, or custom code may provide an input edge with no source vertex. A leaf Vertex is any vertex with no output edges associated with any target Vertex with an output edge (of which there may be more than one such vertices). For example, a leaf vertex may simply have no outputs, it may have an output edge that is unconnected to any target Vertex, or the target vertices all have no outputs. Vertices with no outputs are often used as operational moniters of sorts (e.g. reporting ROC/AUC, accuracy, averages, etc...). However, loss function Vertices that usually connect to such monitor Vertices need to be treated as leaf vertices too!

Optimizers and Graphs

When optimizing a Graph, the Optimizer is initially tasked with identifying two types of Edges

  • Learnable Edges
  • Loss function Edges

NOTE: As a reminder, an Edge stores the input/output tensors and its corresponding gradient.

An Edge is considered learnable if it is both the output of a root vertex and has a non-empty gradient tensor. The output edge of Parameters is usually a learnable edge (though this may be optionally suppressed). An Edge is considered a loss function edge if it's the output of a leaf Vertex and has a single element tensor (i.e. a 1D real value).

The Optimizer can query Graphs for root and leaf vertices and determine which Edges contain the learnable model parameters and which edges contain loss function outputs (there may be more than one). Importantly, multiple dangling loss function edges are implicitly treated as if they are summed together. The Optimizer can then use a Forward pass to calculate the loss function, and then a subsequent Backward pass to calculate learnable gradients (stored in the learnable edges) to use to update the learnable edges.

IMPORTANT: Loss function vertices are responsible for seeding their output gradient with 1 when they are a leaf Vertex.

Bleak C++ API

TODO

Implementing your own Vertex in C++

TODO

Creating a new Module

TODO

Common Vertices

TODO

About

A C++ implementation of neural networks as directed acyclic graphs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published