Skip to content

Neural Network

Marcus Edel edited this page Apr 28, 2015 · 11 revisions

Neural Network

Networks are compositional models that are represented in this framework as a collection of inter-connected layers. The network can be defined from the bottom to the top; from the input data to the loss function. The data and derivatives flow through the complete network implemented by the forward and backward passes. The complexity is fully encapsulated within each layer. The framework communicates with all modules through predefined interfaces. Implementing and comparing new algorithms and architectures is, therefore, straightforward and easy.

The following examples detail a few ways in which the neural network infrastructure can be used.

Implementation

The present implementation can be broken down into the following components.

  • Layer
  • Connections
  • Activation Functions
  • Init Rules
  • Optimizer

The components and their implementation is further discussed.

Layer

Layers are the bricks used to model all neural networks. Layers convolve filters, pooling, nonlinearities like softmax transformation, sigmoid and other element-wise transformations, normalization, data loading and saving, etc. Each layer is themselves a neural networks. Combined with other layer through connections a user can model various networks, limited only by the imagination of the user.

Each layer type defines two critical methods: FeedForward() and FeedBackward()

  • FeedForward(): Takes an input activation, and computes the corresponding output activation of the layer. In general the input and output activation are tensors. Please, refer to each layer specification for further information.
  • FeedBackward(): Performs a backpropagation step through the layer, with respect to the given input activation. In method makes the assumption that the FeedForward() function has been called before, with the same activation. A layer with parameters computes the gradient stores it internally.

In general a layer have two crucial responsibilities for the operation of the network: a forward pass that takes the inputs and produces the outputs, and a backward pass that takes the gradient with respect to the output, and computes the gradients with respect to the parameters and to the inputs, which are in turn back-propagated to other layers.

Implementing or customising existing layer requires minimal effort

Implementing a custom layer requires minimal effort by the user. Define or adjust the FeedForward(), and FeedBackward() method of the layer and it is ready to be used.

Therefore implementing e.g. a Identity Layer is straightforward:

class IdentityLayer
{
  public:
	IdentityLayer() { /* Nothing to do here */ }

   template<typename DataType>
   void FeedForward(const DataType& inputActivation,
   					DataType& outputActivation)
   {
     IdentityFunction::fn(inputActivation, outputActivation);
   }

   template<typename DataType>
   void FeedBackward(const DataType& inputActivation,
                    const DataType& error,
                    DataType& delta)
  {
    DataType derivative;
    IdentityFunction::deriv(inputActivation, derivative);
  	delta = error % derivative;
  }
}

Connections

A connection links two layer. To be exact it links the output of the first layer to the input of the second layer. The connection has the ability to transform the information on the way to the second layer, e.g. by connecting only the diagonal elements or by multiplying with a constant factor. In addition, the connection also transmits the calculated errors backwards to the first layer. At the end the network is just a set of layers connected in a computation graph (directed acyclic graph), within all connections supplied the FeedForward() and FeedBackward() function.

Therefore implementing e.g. a Identity Layer is straightforward:

class FullConnection
{
 public:
  FullConnection() { /* Nothing to do here */ }

  template<typename DataType>
  void FeedForward(const DataType& input)
  {
    outputLayer.InputActivation() += input;
  }

  template<typename DataType>
  void FeedForward(const DataType& input)
  {
    outputLayer.InputActivation() += (weights * input);
  }

Using and tuning the implemented connections is straightforward:

// Create a connection that connects every neuron from the input
// layer with every neuron in the hidden layer.
FullConnection<> connection(inputLayer, hiddenLayer);

// Create a max pooling connection that connects every element
// in the input layer with every element in the output layer.
PoolingConnection<MaxPooling> connection(inputLayer, hiddenLayer);

Activation functions

The activation function is used to transform the activation level of a unit (neuron) into an output signal. There are a number of common activation functions in use. Every activation function implements the same interface, therefore the use of another activation function is straightforward:

// Computes the logistic function.
double x = LogisticFunction::fn(0.5);

// Computes the first derivatives of the logistic function.
double deriv = LogisticFunction::deriv(0.2);

// Computes the inverse of the logistic function.
double inv = LogisticFunction::inv(1);

// Create a layer that uses the rectified linear unit (ReLU) function with 10 neurons.
NeuronLayer<RectifierFunction> reluLayer(10);

// Create a layer that uses the identity function with 4 x 4 neurons.
NeuronLayer<IdentityFunction> identityLayer(4, 4);

mlpack provides a number of existing activation functions which can be used in place of the default logistic function. These include:

IdentityFunction

Applies the identity function element-wise to the input, thus outputting a tensor of the same dimension.

The identity function is defined as:

 * f(x) = x
 * f'(x) = 1

RectifierFunction

Applies the rectified linear unit (ReLU) function element-wise to the input, thus outputting a tensor of the same dimension.

The rectified linear unit function is defined as:

 * f(x) = \max(0, x)
 * f'(x) = \left\{
 *   \begin{array}{lr}
 *     1 & : x > 0 \\
 *     0 & : x \le 0
 *   \end{array}
 * \right

SoftsignFunction

Applies the softsign function element-wise to the input, thus outputting a tensor of the same dimension.

The softsign function is defined as:

 * f(x) = \frac{x}{1 + \abs{x}}
 * f'(x) = (1 - \abs{x})^2
 * f(x) = \left\{
 *   \begin{array}{lr}
 *     -\frac{y}{y-1} & : x > 0 \\
 *     \frac{x}{1 + x} & : x \le 0
 *   \end{array}
 * \right

TanhFunction

Applies the Tangens Hyperbolic function element-wise to the input, thus outputting a tensor of the same dimension.

The Tangens Hyperbolic function is defined as:

 * f(x) &=& \frac{e^x - e^{-x}{e^x + e^{-x}}}
 * f'(x) &=& 1 - tanh^2(x)
 * f^{-1}(x) &=& atan(x)

Optimizer

The optimizer address the general optimization problem of loss minimization. The responsibilities of learning the parameters are shared between the optimzer and the stored parameters whithin each layer.

The optimizer:

  • iteratively optimizes by calling FeedForward() / FeedBackward and updating parameters (periodically) evaluates the test networks
  • snapshots the network state throughout the optimization to track the optimization process

Each optimizer defines a single function, used to model the overall optimization process:

class Optimizer
{
 public:
  Optimizer() { /* Nothing to do here */ }

  template<typename DataType>
  void UpdateWeights(DataType& weights,
                     const DataType& gradient,
                     const double error)
  { ... }
}