Skip to content

kernel-matrix-benchmarks/kernel-matrix-benchmarks

Repository files navigation

Benchmarking kernel matrix vector products, inversions and attention layers

Build Status Code style: black

Computations with kernel matrices are a key bottleneck in many applied fields, from numerical physics to machine learning. This website compares acceleration methods for these problems in an objective way.

Specifically, we are interested in three types of computations:

1. Kernel matrix products. Let us consider:

  • A kernel function k(x,y) defined for any pair of points in dimension D, e.g. a Gaussian kernel.
  • N target points x1,..., xN in dimension D, encoded as a (N,D) array.
  • M source points y1,..., yM in dimension D, encoded as a (M,D) array.
  • M source signals b1,..., bM in dimension E, encoded as a (M,E) array. E=1 in most applications.

Then, we compute the (N,E) array of target signals a1,..., aN with, for all i between 1 and N:

a_i \gets \sum_{j=1}^\text{M} k(x_i,y_j)\,b_j .

We understand this computation as the product between the (N,M) kernel matrix Ki,j = k(xi, yj) and the (M,E) matrix of source signals. Depending on the context, this operation is known as a kernel density estimation, a N-body computation or a point/kernel/spline convolution. Special cases also include the non-uniform Discrete Fourier Transform and operators that are relevant to the Boundary Element Method.

2. Attention layers. With the same notations, row-normalized kernel products or "attention" layers correspond to the computation of the (N,E) array of target values a1,..., aN with, for all i between 1 and N:

a_i \gets \frac{ \sum_{j=1}^\text{M} k(x_i,y_j)\,b_j }{\sum_{j=1}^\text{M} k(x_i,y_j)}.

This operation is fundamental in transformer architectures for e.g. natural language processing. In this context, we often talk about query vectors xi, key vectors yj and value vectors bj with an exponential kernel:

k(x_i, y_j) = \exp\big(\langle x_i, y_j \rangle_{\mathbb{R}^\text{D}} \big).

3. Kernel matrix solver. Finally, if N=M and the output of the kernel matrix product a1,..., aN is known, we are interested in the (M,E) array b solution of the matrix equation "Ki,jbj = ai". In other words, we intend to compute "b = K-1a" such that:

\sum_{j=1}^\text{M} k(x_i,y_j)\,b_j = a_i.

This operation is a key bottleneck for algorithms based on Gaussian processes, Kriging, spline interpolation, kernel regression and the Boundary Element Method.

Scope

This project contains tools to benchmark various implementations of these operations in a wide range of settings:

  • We use both standard CPU hardware and massively parallel GPU accelerators.
  • We work in spaces of varying dimension (D=1 to D=1,000) and geometry (Euclidean, curved, discrete, etc.).
  • We study a wide range of kernels k(xi, yj) that may be oscillating, exhibit singularities, etc.
  • We benchmark both exact and approximate methods.

Our main purpose is to establish a clear reference on the state-of-the-art for kernel computations. We hope that this work will promote cross-pollination between related fields.

Please note that this ongoing benchmark is open to all contributions. We have pre-generated data sets with relevant evaluation metrics and provide a Docker container for each algorithm. We also rely on a test suite to make sure that every algorithm works.

Evaluated implementations and methods

  • KeOps: on-the-fly bruteforce computations on CPU and GPU.

Data sets

We provide a varied collection of test cases for the methods above. All data sets are pre-split into source/target points and come with ground truth data. We store the inputs and expected output in HDF5 format, following the conventions outlined in datasets.py:

Dataset Dimensions Train size Test size Kernel Distance Download
MNIST 784 60,000 10,000 Gaussian Euclidean HDF5 (217MB)
Fashion-MNIST 784 60,000 10,000 Gaussian Euclidean HDF5 (217MB)
GloVe 25 1,183,514 10,000 Exponential Angular HDF5 (121MB)
GloVe 50 1,183,514 10,000 Exponential Angular HDF5 (235MB)
GloVe 100 1,183,514 10,000 Exponential Angular HDF5 (463MB)
GloVe 200 1,183,514 10,000 Exponential Angular HDF5 (918MB)

Results

Full interactive plots can be found on our website. The main performance graphs are summarized below:

Scaling up kernel products on the sphere

...

Run the benchmarks

Install

The only dependencies are Python (tested with 3.6) and Docker. To install them on a fresh Ubuntu instance:

  1. Update the list of packages with sudo apt update.
  2. Install the Python package manager with sudo apt install python3-pip.
  3. Install Docker with sudo apt install docker.io.
  4. Add the current user to the Docker group. Assuming that you are "ubuntu", this is done with sudo usermod -a -G docker ubuntu.
  5. Refresh the Docker group with newgrp docker.

Then:

  1. Clone the repo with git clone https://github.com/kernel-matrix-benchmarks/kernel-matrix-benchmarks.git.
  2. Enter the main directory with cd kernel-matrix-benchmarks.
  3. Run pip3 install -r requirements.txt.
  4. Run python3 install.py to build all the libraries inside Docker containers (this can take a while, like 10-30 minutes).

Running

  1. Run python3 run.py (this can take an extremely long time, potentially days). Note that with Docker, the root user owns all the output files.
  2. Run sudo python3 plot.py or sudo python3 create_website.py to plot results.

You can customize the algorithms and datasets if you want to:

  • Check that algos.yaml contains the parameter settings that you want to test
  • To run experiments on Glove embeddings in dimension 100, invoke python run.py --local --dataset product-sphere-D3-E1-M1000-N1000-inverse-distance. See python run.py --help for more information on possible settings. Note that experiments can take a long time.
  • To process the results, either use python plot.py --dataset product-sphere-D3-E1-M1000-N1000-inverse-distance or python create_website.py. An example call: python create_website.py --latex --scatter --outputdir website/.
  • Look at the create_website_local.sh and create_website_AWS.sh scripts for other examples.

On the Amazon cloud

To reproduce these results in the cloud:

  1. Create an account on AWS EC2.
  2. Log in to the AWS CloudShell.
  3. Use the "Actions" button in the upper-right corner of the window to upload the specification file kmb-instance.json in your CloudShell session. You may find comments and alternative options in kmb-instance-full.js.
  4. Create a new instance (Ubuntu 20.04) with the following AWS CloudShell commands:
aws ec2 create-security-group \
  --group-name KmbSsh \
  --description "SSH access for Kernel Matrix Benchmarks"
aws ec2 authorize-security-group-ingress \
  --group-name KmbSsh \
  --protocol tcp \
  --port 22 \
  --cidr 0.0.0.0/0
aws ec2 request-spot-instances \
  --type "persistent" \
  --instance-interruption-behavior "stop" \
  --launch-specification file://kmb-instance.json \
  --tag-specification 'ResourceType=spot-instances-request,Tags=[{Key=Task,Value=KmbCPU}]'
  1. On startup, the instance will automatically clone this repository and run the create_website_AWS.sh script.
  2. Log in to the cloud instance via ssh, with
ssh -i "kernel-matrix-benchmarks.pem" ubuntu@ec2-1-234-567-890.compute-1.amazonaws.com

Where kernel-matrix-benchmarks.pem is your encryption key and ec2-1-234-567-890 is the id of your instance.

  1. You can monitor progress with:
  • tmux a to get access to the terminal running create_website_AWS.sh.
  • less -R kernel-matrix-benchmarks/kmb.log to read the log file.
  1. Once all benchmarks have been run, the full results will be located in your-instance:/home/ubuntu/kernel-matrix-benchmarks/website.zip. Download the archive on your local machine with:
scp -i "kernel-matrix-benchmarks.pem" ubuntu@ec2-1-234-567-890.compute-1.amazonaws.com:/home/ubuntu/kernel-matrix-benchmarks/website.zip website.zip
  1. Finally, unzip the file website.zip and open website/index.html to inspect your results.

Please note that the AWS Console allows you to keep track of your running instances, available storage volumes, current requests for Spot instances and billing information. Once you are done with your instance, don't forget to cancel your Spot request, terminate your instances and destroy your storage volumes.

Including your algorithm

We welcome all contributions to this benchmarking platform. To add support for a new algorithm, please:

  1. Write a small Python wrapper for your algorithm in kernel_matrix_benchmarks/algorithms. The full API is detailed in base.py and you may look at our bruteforce implementation for reference. Feel free to read the other wrappers to see how to interface Python with a Julia or a C++ library, load CUDA drivers, etc. If you're curious, please note that the actual "benchmarking function" is located in runner.py.
  2. Add a Dockerfile for your library in install/. Please call it Dockerfile.nameofyourlibrary.
  3. Document your method in algos.yaml. You may specify different lists of arguments for different tasks and datasets. The full syntax is illustrated at the beginning of algos.yaml.
  4. Please note that we apply black linting to all commits.
  5. Submit a pull request on our repository.

While debugging on your local machine, you may find the create_website script helpful. We will make sure that your code runs smoothly and re-render our website on the AWS EC2 cloud before merging your contributions. Thanks!

Main files

Principles

Open science:

  • Everyone is welcome to submit pull requests with tweaks and changes to how each library is being used.
  • In particular: if you are the author of one of these libraries and you think that the benchmark can be improved, please consider submitting a pull request.
  • This is meant to be an ongoing project and represent the current state-of-the-art.
  • We make everything easy to replicate, including the installation of the libraries and the data pre-processing.

Diverse use cases:

  • We showcase challenging datasets from all fields that rely on kernel computations.
  • We benchmark both pre-computation and query times.
  • We support both CPU and GPU implementations.
  • We try many different values of parameters for each library and ignore the points that are not on the precision-performance frontier.

Authors

We rely heavily on the template of the ANN-benchmarks website, built by Erik Bernhardsson with significant contributions from Martin Aumüller and Alexander Faithfull.

Related Publications

We will document our framework and results with a publication in due time. Meanwhile, the following paper details design principles behind the ANN-benchmark framework:

Related Projects