cudnn.torch

Torch7 FFI bindings for NVIDIA cuDNN (R5) kernels!

Modules are API compatible their nn equivalents. Fully unit-tested against nn implementations. Conversion between nn and cudnn is available through cudnn.convert function.

Installation

Install cuDNN (version R5 EA)
Have at least CUDA 7.0
Have libcudnn.so in your library path ($LD_LIBRARY_PATH) (Install cuDNN it from https://developer.nvidia.com/cuDNN )
Instead of the previous step, you can copy the library files into /usr/local/cuda/lib64/ or to the corresponding folders in CUDA directory

Modules

-- All inputs have to be 3D or 4D(batch-mode), except ReLU, Tanh, Sigmoid, and BatchNormalization
cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW = 1], [dH = 1], [padW = 0], [padH = 0], [groups = 1])
cudnn.SpatialMaxPooling(kW, kH, dW, dH, padW, padH)
cudnn.SpatialAveragePooling(kW, kH, dW, dH, padW, padH)

-- the pointwise functions take an additional optional argument. if inplace=true then they do operations in-place without using any extra memory for themselves
cudnn.ReLU(inplace[=false])
cudnn.ClippedReLU(ceiling, inplace[=false])
cudnn.Tanh(inplace[=false])
cudnn.Sigmoid(inplace[=false])

-- SoftMax can be run in fast mode or accurate mode. Default is accurate mode.
cudnn.SoftMax(fastMode [= false])          -- SoftMax across each image (just like nn.SoftMax)
cudnn.LogSoftMax()                         -- LogSoftMax across each image (just like nn.LogSoftMax)
cudnn.SpatialSoftMax(fastMode [= false])   -- SoftMax across feature-maps (per spatial location)
cudnn.SpatialLogSoftMax()                  -- LogSoftMax across feature-maps (per spatial location)
cudnn.VolumetricSoftMax(fastMode [= false])   -- SoftMax across feature-maps (per spatial location)
cudnn.VolumetricLogSoftMax()                  -- LogSoftMax across feature-maps (per spatial location)

cudnn.SpatialCrossEntropyCriterion()       -- A spatial version of LogSoftMax + ClassNLLCriterion in one shot
cudnn.VolumetricCrossEntropyCriterion()       -- A volumetric version of LogSoftMax + ClassNLLCriterion in one shot

-- Batch Normalization
cudnn.BatchNormalization(nFeature, eps, momentum, affine) -- same arguments as https://github.com/torch/nn/blob/master/doc/simple.md#nn.BatchNormalization
cudnn.SpatialBatchNormalization(nFeature, eps, momentum, affine)
cudnn.VolumetricBatchNormalization(nFeature, eps, momentum, affine)


-- Volumetric inputs (4D or 5D batched mode)
cudnn.VolumetricConvolution(nInputPlane, nOutputPlane, kT, kW, kH, dT, dW, dH, padT, padW, padH)
cudnn.VolumetricMaxPooling(kT, kW, kH, dT, dW, dH, padT, padW, padH)
cudnn.VolumetricAveragePooling(kT, kW, kH, dT, dW, dH, padT, padW, padH)

-- Recurrent Modules

-- All inputs have to be 3D. Accepts input of seqLength x batch x inputDim, or batch x seqLength x inputDim if batchFirst set to true.
cudnn.RNNReLU(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.RNNTanh(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.LSTM(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.GRU(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.BLSTM(inputDim, outputDim, numberOfLayers, [batchFirst = false])

Modes

There are two globally availabe modes useful for tuning performance:

require 'cudnn'
cudnn.benchmark = true -- uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms.
                       -- If this is set to false, uses some in-built heuristics that might not always be fastest.

by default cudnn.benchmark is set to false. Setting to true will improve performance, at the expense of using more memory. The input shape should be the same for each batch, otherwise autotune will re-run for each batch, causing a huge slow-down.

cudnn.fastest = true -- this is like the :fastest() mode for the Convolution modules,
                     -- simply picks the fastest convolution algorithm, rather than tuning for workspace size

by default, cudnn.fastest is set to false. You should set to true if memory is not an issue, and you want the fastest performance

cudnn.verbose = true -- this prints out some more verbose information useful for debugging

by default, cudnn.verbose is set to false.

Conversion between `cudnn` and `nn`

Conversion is done by cudnn.convert function which takes a network and backend arguments and goes over network modules recursively substituting equivalents. No memory copy is done, just metatables are swapped. If you don't want to convert all modules you can pass a function as the third argument to cudnn.convert. It will be called at each step, with a module that is currently converted. It is meant to exclude modules i.e. if it returns true, they will be left untouched, otherwise they will be subject to conversion.

Note that you cannot do backward pass when using cuDNN and when your model has batch normalization layers and is in evaluate mode.

net = nn.Sequential()
net:add(nn.SpatialConvolution(3,96,11,11,3,3))
net:add(nn.ReLU())
cudnn.convert(net, cudnn)
print(net)

net = nn.Sequential()
net:add(nn.SpatialConvolution(3,96,11,11,3,3))
net:add(nn.ReLU())
cudnn.convert(net, cudnn, function(module)
   return torch.type(module):find('ReLU')
end)
print(net)

will result in:

nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): cudnn.SpatialConvolution(3 -> 96, 11x11, 3,3)
  (2): cudnn.ReLU
}
nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): cudnn.SpatialConvolution(3 -> 96, 11x11, 3,3)
  (2): nn.ReLU
}

Older versions

For version CuDNN R1, checkout the branch R1 For version CuDNN R2, checkout the branch R2 For version CuDNN R3, checkout the branch R3 For version CuDNN R4, checkout the branch R4

Name		Name	Last commit message	Last commit date
Latest commit History 489 Commits
cmake		cmake
test		test
.gitignore		.gitignore
BGRU.lua		BGRU.lua
BLSTM.lua		BLSTM.lua
BatchNormalization.lua		BatchNormalization.lua
CMakeLists.txt		CMakeLists.txt
ClippedReLU.lua		ClippedReLU.lua
GRU.lua		GRU.lua
LICENSE		LICENSE
LSTM.lua		LSTM.lua
LogSoftMax.lua		LogSoftMax.lua
Pointwise.lua		Pointwise.lua
Pooling.lua		Pooling.lua
Pooling3D.lua		Pooling3D.lua
README.md		README.md
RNN.lua		RNN.lua
RNNReLU.lua		RNNReLU.lua
RNNTanh.lua		RNNTanh.lua
ReLU.lua		ReLU.lua
Sigmoid.lua		Sigmoid.lua
SoftMax.lua		SoftMax.lua
SpatialAveragePooling.lua		SpatialAveragePooling.lua
SpatialBatchNormalization.lua		SpatialBatchNormalization.lua
SpatialConvolution.lua		SpatialConvolution.lua
SpatialCrossEntropyCriterion.lua		SpatialCrossEntropyCriterion.lua
SpatialCrossMapLRN.lua		SpatialCrossMapLRN.lua
SpatialDivisiveNormalization.lua		SpatialDivisiveNormalization.lua
SpatialFullConvolution.lua		SpatialFullConvolution.lua
SpatialLogSoftMax.lua		SpatialLogSoftMax.lua
SpatialMaxPooling.lua		SpatialMaxPooling.lua
SpatialSoftMax.lua		SpatialSoftMax.lua
Tanh.lua		Tanh.lua
TemporalConvolution.lua		TemporalConvolution.lua
VolumetricAveragePooling.lua		VolumetricAveragePooling.lua
VolumetricBatchNormalization.lua		VolumetricBatchNormalization.lua
VolumetricConvolution.lua		VolumetricConvolution.lua
VolumetricCrossEntropyCriterion.lua		VolumetricCrossEntropyCriterion.lua
VolumetricFullConvolution.lua		VolumetricFullConvolution.lua
VolumetricLogSoftMax.lua		VolumetricLogSoftMax.lua
VolumetricMaxPooling.lua		VolumetricMaxPooling.lua
VolumetricSoftMax.lua		VolumetricSoftMax.lua
convert.lua		convert.lua
cudnn-scm-1.rockspec		cudnn-scm-1.rockspec
env.lua		env.lua
ffi.lua		ffi.lua
find.lua		find.lua
functional.lua		functional.lua
init.lua		init.lua

License

soumith/cudnn.torch

Folders and files

Latest commit

History

Repository files navigation

cudnn.torch

Installation

Modules

Modes

Conversion between cudnn and nn

Older versions

About

Resources

License

Stars

Watchers

Forks

Languages

Conversion between `cudnn` and `nn`