Benchmarking kernel matrix vector products, inversions and attention layers

Computations with kernel matrices are a key bottleneck in many applied fields, from numerical physics to machine learning. This website compares acceleration methods for these problems in an objective way.

Specifically, we are interested in three types of computations:

1. Kernel matrix products. Let us consider:

A kernel function k(x,y) defined for any pair of points in dimension D, e.g. a Gaussian kernel.
N target points x₁,..., x_N in dimension D, encoded as a (N,D) array.
M source points y₁,..., y_M in dimension D, encoded as a (M,D) array.
M source signals b₁,..., b_M in dimension E, encoded as a (M,E) array. E=1 in most applications.

Then, we compute the (N,E) array of target signals a₁,..., a_N with, for all i between 1 and N:

$a_i \gets \sum_{j=1}^\text{M} k(x_i,y_j)\,b_j .$

We understand this computation as the product between the (N,M) kernel matrix K_i,j = k(x_i, y_j) and the (M,E) matrix of source signals. Depending on the context, this operation is known as a kernel density estimation, a N-body computation or a point/kernel/spline convolution. Special cases also include the non-uniform Discrete Fourier Transform and operators that are relevant to the Boundary Element Method.

2. Attention layers. With the same notations, row-normalized kernel products or "attention" layers correspond to the computation of the (N,E) array of target values a₁,..., a_N with, for all i between 1 and N:

$a_i \gets \frac{ \sum_{j=1}^\text{M} k(x_i,y_j)\,b_j }{\sum_{j=1}^\text{M} k(x_i,y_j)}.$

This operation is fundamental in transformer architectures for e.g. natural language processing. In this context, we often talk about query vectors x_i, key vectors y_j and value vectors b_j with an exponential kernel:

$k(x_i, y_j) = \exp\big(\langle x_i, y_j \rangle_{\mathbb{R}^\text{D}} \big).$

3. Kernel matrix solver. Finally, if N=M and the output of the kernel matrix product a₁,..., a_N is known, we are interested in the (M,E) array b solution of the matrix equation "K_i,jb_j = a_i". In other words, we intend to compute "b = K^-1a" such that:

$\sum_{j=1}^\text{M} k(x_i,y_j)\,b_j = a_i.$

This operation is a key bottleneck for algorithms based on Gaussian processes, Kriging, spline interpolation, kernel regression and the Boundary Element Method.

Scope

This project contains tools to benchmark various implementations of these operations in a wide range of settings:

We use both standard CPU hardware and massively parallel GPU accelerators.
We work in spaces of varying dimension (D=1 to D=1,000) and geometry (Euclidean, curved, discrete, etc.).
We study a wide range of kernels k(x_i, y_j) that may be oscillating, exhibit singularities, etc.
We benchmark both exact and approximate methods.

Our main purpose is to establish a clear reference on the state-of-the-art for kernel computations. We hope that this work will promote cross-pollination between related fields.

Please note that this ongoing benchmark is open to all contributions. We have pre-generated data sets with relevant evaluation metrics and provide a Docker container for each algorithm. We also rely on a test suite to make sure that every algorithm works.

Evaluated implementations and methods

KeOps: on-the-fly bruteforce computations on CPU and GPU.

Data sets

We provide a varied collection of test cases for the methods above. All data sets are pre-split into source/target points and come with ground truth data. We store the inputs and expected output in HDF5 format, following the conventions outlined in datasets.py:

Dataset	Dimensions	Train size	Test size	Kernel	Distance	Download
MNIST	784	60,000	10,000	Gaussian	Euclidean	HDF5 (217MB)
Fashion-MNIST	784	60,000	10,000	Gaussian	Euclidean	HDF5 (217MB)
GloVe	25	1,183,514	10,000	Exponential	Angular	HDF5 (121MB)
GloVe	50	1,183,514	10,000	Exponential	Angular	HDF5 (235MB)
GloVe	100	1,183,514	10,000	Exponential	Angular	HDF5 (463MB)
GloVe	200	1,183,514	10,000	Exponential	Angular	HDF5 (918MB)

Results

Full interactive plots can be found on our website. The main performance graphs are summarized below:

Scaling up kernel products on the sphere

...

Run the benchmarks

Install

The only dependencies are Python (tested with 3.6) and Docker. To install them on a fresh Ubuntu instance:

Update the list of packages with sudo apt update.
Install the Python package manager with sudo apt install python3-pip.
Install Docker with sudo apt install docker.io.
Add the current user to the Docker group. Assuming that you are "ubuntu", this is done with sudo usermod -a -G docker ubuntu.
Refresh the Docker group with newgrp docker.

Then:

Clone the repo with git clone https://github.com/kernel-matrix-benchmarks/kernel-matrix-benchmarks.git.
Enter the main directory with cd kernel-matrix-benchmarks.
Run pip3 install -r requirements.txt.
Run python3 install.py to build all the libraries inside Docker containers (this can take a while, like 10-30 minutes).

Running

Run python3 run.py (this can take an extremely long time, potentially days). Note that with Docker, the root user owns all the output files.
Run sudo python3 plot.py or sudo python3 create_website.py to plot results.

You can customize the algorithms and datasets if you want to:

Check that algos.yaml contains the parameter settings that you want to test
To run experiments on Glove embeddings in dimension 100, invoke python run.py --local --dataset product-sphere-D3-E1-M1000-N1000-inverse-distance. See python run.py --help for more information on possible settings. Note that experiments can take a long time.
To process the results, either use python plot.py --dataset product-sphere-D3-E1-M1000-N1000-inverse-distance or python create_website.py. An example call: python create_website.py --latex --scatter --outputdir website/.
Look at the create_website_local.sh and create_website_AWS.sh scripts for other examples.

On the Amazon cloud

To reproduce these results in the cloud:

Create an account on AWS EC2.
Log in to the AWS CloudShell.
Use the "Actions" button in the upper-right corner of the window to upload the specification file kmb-instance.json in your CloudShell session. You may find comments and alternative options in kmb-instance-full.js.
Create a new instance (Ubuntu 20.04) with the following AWS CloudShell commands:

aws ec2 create-security-group \
  --group-name KmbSsh \
  --description "SSH access for Kernel Matrix Benchmarks"
aws ec2 authorize-security-group-ingress \
  --group-name KmbSsh \
  --protocol tcp \
  --port 22 \
  --cidr 0.0.0.0/0
aws ec2 request-spot-instances \
  --type "persistent" \
  --instance-interruption-behavior "stop" \
  --launch-specification file://kmb-instance.json \
  --tag-specification 'ResourceType=spot-instances-request,Tags=[{Key=Task,Value=KmbCPU}]'

On startup, the instance will automatically clone this repository and run the create_website_AWS.sh script.
Log in to the cloud instance via ssh, with

ssh -i "kernel-matrix-benchmarks.pem" ubuntu@ec2-1-234-567-890.compute-1.amazonaws.com

Where kernel-matrix-benchmarks.pem is your encryption key and ec2-1-234-567-890 is the id of your instance.

You can monitor progress with:

tmux a to get access to the terminal running create_website_AWS.sh.
less -R kernel-matrix-benchmarks/kmb.log to read the log file.

Once all benchmarks have been run, the full results will be located in your-instance:/home/ubuntu/kernel-matrix-benchmarks/website.zip. Download the archive on your local machine with:

scp -i "kernel-matrix-benchmarks.pem" ubuntu@ec2-1-234-567-890.compute-1.amazonaws.com:/home/ubuntu/kernel-matrix-benchmarks/website.zip website.zip

Finally, unzip the file website.zip and open website/index.html to inspect your results.

Please note that the AWS Console allows you to keep track of your running instances, available storage volumes, current requests for Spot instances and billing information. Once you are done with your instance, don't forget to cancel your Spot request, terminate your instances and destroy your storage volumes.

Including your algorithm

We welcome all contributions to this benchmarking platform. To add support for a new algorithm, please:

Write a small Python wrapper for your algorithm in kernel_matrix_benchmarks/algorithms. The full API is detailed in base.py and you may look at our bruteforce implementation for reference. Feel free to read the other wrappers to see how to interface Python with a Julia or a C++ library, load CUDA drivers, etc. If you're curious, please note that the actual "benchmarking function" is located in runner.py.
Add a Dockerfile for your library in install/. Please call it Dockerfile.nameofyourlibrary.
Document your method in algos.yaml. You may specify different lists of arguments for different tasks and datasets. The full syntax is illustrated at the beginning of algos.yaml.
Please note that we apply black linting to all commits.
Submit a pull request on our repository.

While debugging on your local machine, you may find the create_website script helpful. We will make sure that your code runs smoothly and re-render our website on the AWS EC2 cloud before merging your contributions. Thanks!

Main files

install.py: builds the Docker images from install/.
algos.yaml: lists all supported methods and parameter values.
kernel_matrix_benchmarks/:
- definitions.py: parser for algos.yaml.
- datasets.py: supported datasets.
- main.py: runs all supported experiments on a given dataset.
- runner.py: runs a specific experiment and saves results in a HDF5 file.
- algorithms/:
  - base.py: common interface for the methods included in the benchmark.
  - bruteforce.py: reference implementation with NumPy.
- plotting/:
  - metrics.py: supported performance metrics.
  - plot_variants.py: interesting pairs of metrics for the detailed webpages.
  - utils.py: computes the performance metrics and Pareto fronts.
plot.py: renders png images.
create_website.py: renders the website using the Jinja templates from templates/.

Principles

Open science:

Everyone is welcome to submit pull requests with tweaks and changes to how each library is being used.
In particular: if you are the author of one of these libraries and you think that the benchmark can be improved, please consider submitting a pull request.
This is meant to be an ongoing project and represent the current state-of-the-art.
We make everything easy to replicate, including the installation of the libraries and the data pre-processing.

Diverse use cases:

We showcase challenging datasets from all fields that rely on kernel computations.
We benchmark both pre-computation and query times.
We support both CPU and GPU implementations.
We try many different values of parameters for each library and ignore the points that are not on the precision-performance frontier.

Authors

Jean Feydy, HeKA team, INRIA Paris.
...

We rely heavily on the template of the ANN-benchmarks website, built by Erik Bernhardsson with significant contributions from Martin Aumüller and Alexander Faithfull.

Related Publications

We will document our framework and results with a publication in due time. Meanwhile, the following paper details design principles behind the ANN-benchmark framework:

M. Aumüller, E. Bernhardsson, A. Faithfull: ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. Information Systems 2019. DOI: 10.1016/j.is.2019.02.006

Related Projects

ann-benchmarks is the reference website for approximate nearest neighbor search.
big-ann-benchmarks is a benchmarking effort for billion-scale approximate nearest neighbor search as part of the NeurIPS'21 Competition track.

Name		Name	Last commit message	Last commit date
Latest commit History 1,372 Commits
.github/workflows		.github/workflows
.vscode		.vscode
install		install
kernel_matrix_benchmarks		kernel_matrix_benchmarks
templates		templates
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
algos.yaml		algos.yaml
create_dataset.py		create_dataset.py
create_website.py		create_website.py
create_website_AWS.sh		create_website_AWS.sh
create_website_local.sh		create_website_local.sh
install.py		install.py
kmb-instance-full.js		kmb-instance-full.js
kmb-instance.json		kmb-instance.json
logging.conf		logging.conf
plot.py		plot.py
requirements.txt		requirements.txt
run.py		run.py
run_algorithm.py		run_algorithm.py

License

kernel-matrix-benchmarks/kernel-matrix-benchmarks

Folders and files

Latest commit

History

Repository files navigation