unfoldNd: N-dimensional unfold in PyTorch

This package uses a numerical trick to perform the operations of torch.nn.functional.unfold and torch.nn.Unfold , also known as im2col. It extends them to higher-dimensional inputs that are currently not supported.

From the PyTorch docs:

Currently, only 4-D input tensors (batched image-like tensors) are supported.

unfoldNd implements the operation for 3d and 5d inputs and shows good performance.

—

News:

[2022-11-09 Wed]: Support for input unfolding for transpose convolutions (im2col) with 3d/4d/5d inputs.
[2021-05-02 Sun]: unfoldNd now also generalizes the fold operation (col2im) to 3d/4d/5d inputs

Installation

pip install --user unfoldNd

Usage

Simple example

This package offers the following main functionality:

unfoldNd.unfoldNd: Like torch.nn.functional.unfold , but supports 3d, 4d, and 5d inputs.
unfoldNd.UnfoldNd: Like torch.nn.Unfold , but supports 3d, 4d, and 5d inputs.

Additional functionality (exotic)

Turned out the multi-dimensional generalization of torch.nn.functional.unfold can be used to generalize torch.nn.functional.fold ,

Simple example

exposed through

unfoldNd.foldNd: Like torch.nn.functional.fold , but supports 3d, 4d, and 5d inputs.
unfoldNd.FoldNd: Like torch.nn.Fold , but supports 3d, 4d, and 5d inputs.

Keep in mind that, while tested, this feature is not benchmarked. However, sane performance can be expected, as it relies on N-dimensional unfold (benchmarked) and torch.scatter_add .

—

Like input unfolding for convolutions, one can apply the same concept to the input of a transpose convolution. There is no comparable functionality for this in PyTorch as it is very exotic.

The following example explains input unfolding for transpose convolutions by demonstrating the connection to transpose convolution as matrix multiplication.

Simple example

This functionality is exposed through

unfoldNd.unfoldTransposeNd: Like unfoldNd.unfoldNd, but for unfolding inputs of transpose convolutions.
unfoldNd.UnfoldTransposeNd: Like unfoldNd.UnfoldNd, but for unfolding inputs of transpose convolutions.

Performance

TL;DR: If you are willing to sacrifice a bit of RAM, you can get decent speedups with unfoldNd over torch.nn.Unfold in both the forward and backward operations.

—

There is a continuous benchmark comparing the forward pass (and forward-backward pass) run time and peak memory here. The settings are:

“example”: Configuration used in the example.
“allcnnc-conv{1,2,3,4,6,7,8}”: Convolution layers from the All-CNNC on CIFAR-100 with batch size 256, borrowed from DeepOBS. Layers 5 and 9 have been removed because they are identical to others in terms of input/output shapes and hyperparameters.
This is a reasonably large setting where one may want to compute the unfolded input, e.g. for the KFAC approximation.

Hardware details

The machine running the benchmark has 32GB of RAM with components

cpu: Intel® Core™ i7-8700K CPU @ 3.70GHz × 12
cuda: GeForce RTX 2080 Ti (11GB)

Results

Forward pass: unfoldNd is faster than torch.nn.Unfold in all, except one, benchmarks. The latest commit run time is compared here on GPU, and here on CPU.
Forward-backward pass: unfoldNd is faster than torch.nn.Unfold in all benchmarks. The latest commit run time is compared here on GPU, and here on CPU.
Higher peak memory: The one-hot convolution approach used by unfoldNd consistently reaches higher peak memory (see here). The difference to torch.nn.Unfold is higher than the one-hot kernel storage; probably the underlying convolution requires additional memory (not confirmed).

Background

Convolutions can be expressed as matrix-matrix multiplication between two objects; a matrix-view of the kernel and the unfolded input. The latter results from stacking all elements of the input that overlap with the kernel in one convolution step into a matrix. This perspective is sometimes helpful because it allows treating convolutions similar to linear layers.

The trick

Extracting the input elements that overlap with the kernel can be done by a one-hot kernel of the same dimension, and using group convolutions.

Applications

This is an incomplete list where the unfolded input may be useful:

It has been used for developing second-order optimization methods in deep learning by approximating the Fisher with Kronecker factors. See A Kronecker-factored approximate Fisher matrix for convolution layers.
I’ve used the similarity between linear and convolutional layers to implement some automatic differentiation operations for the latter in BackPACK.

Known issues

Encountered a problem? Open an issue here.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
examples		examples
test		test
unfoldNd		unfoldNd
.conda_env.yml		.conda_env.yml
.envrc		.envrc
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
black.toml		black.toml
changelog.md		changelog.md
makefile		makefile
pytest.ini		pytest.ini
setup.cfg		setup.cfg
setup.py		setup.py

License

f-dangel/unfoldNd

Folders and files

Latest commit

History

Repository files navigation

unfoldNd: N-dimensional unfold in PyTorch

Installation

Usage

Additional functionality (exotic)

Performance

Hardware details

Results

Background

The trick

Applications

Known issues

About

Resources

License

Stars

Watchers

Forks

Languages