Skip to content

hughperkins/clnn

Repository files navigation

clnn

OpenCL backend for Torch nn neural networks library.

Installation

Please see distro-cl for installation instructions.

What works

Parameterized Modules

  • nn.Linear

Basic Tensor methods

These mostly 'just work', since based on underlying tensor methods, already implemented in cltorch. Tested with:

  • nn.Narrow

Miscellaneous modules

  • nn.Identity
  • nn.Dropout

Convolution layers

  • nn.SpatialConvolutionMM
  • nn.SpatialMaxPooling (including ceil mode)
  • nn.SpatialAveragePooling
  • nn.TemporalConvolution2 This is specific to clnn. It works on cpu and cuda too, not just on OpenCL. It is API-compatible with TemporalConvolution, and faster than TemporalConvolution, on both CUDA and OpenCL.

Transfer function layers

  • nn.Tanh
  • nn.Sigmoid
  • nn.ReLU
  • nn.ELU
  • nn.Exp
  • nn.Sqrt
  • nn.Square
  • nn.Abs
  • nn.LogSigmoid
  • nn.HardTanh
  • nn.LogSoftMax
  • nn.SoftMax (including spatial mode)

Table layers

These 'just work', since they are based on underlying torch operations, which are already implemented in cltorch. Tested with:

  • nn.CMulTable
  • nn.CAddTable

Criterions

  • nn.MSECriterion
  • nn.ClassNLLCriterion

Containers:

Containers 'just work', since they just call standard operations on the contained modules. Tested with:

  • nn.Sequential
  • nngraph

Trainers

In theory, trainers 'just work', since they just call standard torch methods on the network. The following are good first choices:

  • nn.StochasticGradient
  • optim.lbfgs
  • optim.adam

Timings

Soumith benchmark layers

Please see https://github.com/soumith/convnet-benchmarks#imagenet-winners-benchmarking

  • On a Titan X, OpenCL torch is about 3 times slower than CUDA torch
    • eg for VGG, cutorch takes 1100ms, and cltorch takes 3400ms

Example networks

Porting guidelines

Porting guidelines, for project maintainers, available here: porting-guidelines.md.

Recent changes

  • 2nd May:
    • Re-applied:
      • 26th March:
        • add TemporalConvolution2: same API and usage as TemporalConvolution, but faster on GPUs
  • 31st April:
    • Re-applied:
      • 10th March:
        • @pawni (Nick Pawlowski) added SpatialUpSamplingNearest. Thank you Nick
      • 20th February:
        • @gloine (Jaehyung Lee) added support for non-batched input to ClassNLLCriterion. Thank you Jaehyung
  • 30th April:
    • rolled back to as-of 21st February, prior to lots of THNN changes in upstream Torch
    • additionally, installation procedure is now to use a specific torch distro, for stability
  • 1st Feb:
    • merged/ported THNN phase 3. Any weird build issues, please update both nn and clnn.
  • 2nd January, 2016:
    • merged/ported THNN architecture across, and the implementation of Abs, so the unit-tests pass again now
  • 15th December:
  • 29th November:
    • added ELU
  • 25th September:
  • 23rd September:
    • ported latest cunn implementation of SpatialMaxPooling across, ie approximately Sergey's Deterministic max-pooling PR
      • this includes :ceil() implementation
  • 22nd September:
    • added non-batch implementation of LogSoftMax (previously only handled batched input)
    • added SoftMax, for both batched and non-batched
  • 20th September:
    • added non-batch implementation for SpatialMaxPooling (previously only handled batched input), for contiguous pools

Older changes