Computations with kernel matrices are a key bottleneck in many applied fields, from numerical physics to machine learning. This website compares acceleration methods for these problems in an objective way.
Specifically, we are interested in three types of computations:
1. Kernel matrix products. Let us consider:
- A kernel function k(x,y) defined for any pair of points in dimension D, e.g. a Gaussian kernel.
- N target points x1,..., xN in dimension D, encoded as a
(N,D)
array. - M source points y1,..., yM in dimension D, encoded as a
(M,D)
array. - M source signals b1,..., bM in dimension E, encoded as a
(M,E)
array. E=1 in most applications.
Then, we compute the (N,E)
array of target signals
a1,..., aN with, for all i between 1 and N:
We understand this computation as the product between the (N,M)
kernel matrix Ki,j = k(xi, yj) and
the (M,E)
matrix of source signals.
Depending on the context, this operation is known as
a kernel density estimation, a N-body computation
or a point/kernel/spline convolution.
Special cases also include the non-uniform Discrete Fourier Transform
and operators that are relevant to the Boundary Element Method.
2. Attention layers. With the same notations, row-normalized kernel
products or "attention" layers correspond to the computation
of the (N,E)
array of target values
a1,..., aN with, for all i between 1 and N:
This operation is fundamental in transformer architectures for e.g. natural language processing. In this context, we often talk about query vectors xi, key vectors yj and value vectors bj with an exponential kernel:
3. Kernel matrix solver.
Finally, if N=M and the output of the kernel matrix product
a1,..., aN is known,
we are interested in the (M,E)
array b
solution of the
matrix equation "Ki,jbj = ai".
In other words, we intend to compute "b = K-1a" such that:
This operation is a key bottleneck for algorithms based on Gaussian processes, Kriging, spline interpolation, kernel regression and the Boundary Element Method.
This project contains tools to benchmark various implementations of these operations in a wide range of settings:
- We use both standard CPU hardware and massively parallel GPU accelerators.
- We work in spaces of varying dimension (D=1 to D=1,000) and geometry (Euclidean, curved, discrete, etc.).
- We study a wide range of kernels k(xi, yj) that may be oscillating, exhibit singularities, etc.
- We benchmark both exact and approximate methods.
Our main purpose is to establish a clear reference on the state-of-the-art for kernel computations. We hope that this work will promote cross-pollination between related fields.
Please note that this ongoing benchmark is open to all contributions. We have pre-generated data sets with relevant evaluation metrics and provide a Docker container for each algorithm. We also rely on a test suite to make sure that every algorithm works.
- KeOps: on-the-fly bruteforce computations on CPU and GPU.
We provide a varied collection of test cases for the methods above. All data sets are pre-split into source/target points and come with ground truth data. We store the inputs and expected output in HDF5 format, following the conventions outlined in datasets.py:
Dataset | Dimensions | Train size | Test size | Kernel | Distance | Download |
---|---|---|---|---|---|---|
MNIST | 784 | 60,000 | 10,000 | Gaussian | Euclidean | HDF5 (217MB) |
Fashion-MNIST | 784 | 60,000 | 10,000 | Gaussian | Euclidean | HDF5 (217MB) |
GloVe | 25 | 1,183,514 | 10,000 | Exponential | Angular | HDF5 (121MB) |
GloVe | 50 | 1,183,514 | 10,000 | Exponential | Angular | HDF5 (235MB) |
GloVe | 100 | 1,183,514 | 10,000 | Exponential | Angular | HDF5 (463MB) |
GloVe | 200 | 1,183,514 | 10,000 | Exponential | Angular | HDF5 (918MB) |
Full interactive plots can be found on our website. The main performance graphs are summarized below:
...
The only dependencies are Python (tested with 3.6) and Docker. To install them on a fresh Ubuntu instance:
- Update the list of packages with
sudo apt update
. - Install the Python package manager with
sudo apt install python3-pip
. - Install Docker with
sudo apt install docker.io
. - Add the current user to the Docker group. Assuming that you are "ubuntu", this is done with
sudo usermod -a -G docker ubuntu
. - Refresh the Docker group with
newgrp docker
.
Then:
- Clone the repo with
git clone https://github.com/kernel-matrix-benchmarks/kernel-matrix-benchmarks.git
. - Enter the main directory with
cd kernel-matrix-benchmarks
. - Run
pip3 install -r requirements.txt
. - Run
python3 install.py
to build all the libraries inside Docker containers (this can take a while, like 10-30 minutes).
- Run
python3 run.py
(this can take an extremely long time, potentially days). Note that with Docker, the root user owns all the output files. - Run
sudo python3 plot.py
orsudo python3 create_website.py
to plot results.
You can customize the algorithms and datasets if you want to:
- Check that algos.yaml contains the parameter settings that you want to test
- To run experiments on Glove embeddings in dimension 100, invoke
python run.py --local --dataset product-sphere-D3-E1-M1000-N1000-inverse-distance
. Seepython run.py --help
for more information on possible settings. Note that experiments can take a long time. - To process the results, either use
python plot.py --dataset product-sphere-D3-E1-M1000-N1000-inverse-distance
orpython create_website.py
. An example call:python create_website.py --latex --scatter --outputdir website/
. - Look at the create_website_local.sh and create_website_AWS.sh scripts for other examples.
To reproduce these results in the cloud:
- Create an account on AWS EC2.
- Log in to the AWS CloudShell.
- Use the "Actions" button in the upper-right corner of the window to upload the specification file kmb-instance.json in your CloudShell session. You may find comments and alternative options in kmb-instance-full.js.
- Create a new instance (Ubuntu 20.04) with the following AWS CloudShell commands:
aws ec2 create-security-group \
--group-name KmbSsh \
--description "SSH access for Kernel Matrix Benchmarks"
aws ec2 authorize-security-group-ingress \
--group-name KmbSsh \
--protocol tcp \
--port 22 \
--cidr 0.0.0.0/0
aws ec2 request-spot-instances \
--type "persistent" \
--instance-interruption-behavior "stop" \
--launch-specification file://kmb-instance.json \
--tag-specification 'ResourceType=spot-instances-request,Tags=[{Key=Task,Value=KmbCPU}]'
- On startup, the instance will automatically clone this repository and run the create_website_AWS.sh script.
- Log in to the cloud instance via ssh, with
ssh -i "kernel-matrix-benchmarks.pem" ubuntu@ec2-1-234-567-890.compute-1.amazonaws.com
Where kernel-matrix-benchmarks.pem
is your encryption key and ec2-1-234-567-890
is the id of your instance.
- You can monitor progress with:
tmux a
to get access to the terminal running create_website_AWS.sh.less -R kernel-matrix-benchmarks/kmb.log
to read the log file.
- Once all benchmarks have been run, the full results will be located in
your-instance:/home/ubuntu/kernel-matrix-benchmarks/website.zip
. Download the archive on your local machine with:
scp -i "kernel-matrix-benchmarks.pem" ubuntu@ec2-1-234-567-890.compute-1.amazonaws.com:/home/ubuntu/kernel-matrix-benchmarks/website.zip website.zip
- Finally, unzip the file
website.zip
and openwebsite/index.html
to inspect your results.
Please note that the AWS Console allows you to keep track of your running instances, available storage volumes, current requests for Spot instances and billing information. Once you are done with your instance, don't forget to cancel your Spot request, terminate your instances and destroy your storage volumes.
We welcome all contributions to this benchmarking platform. To add support for a new algorithm, please:
- Write a small Python wrapper for your algorithm in kernel_matrix_benchmarks/algorithms. The full API is detailed in base.py and you may look at our bruteforce implementation for reference. Feel free to read the other wrappers to see how to interface Python with a Julia or a C++ library, load CUDA drivers, etc. If you're curious, please note that the actual "benchmarking function" is located in runner.py.
- Add a Dockerfile for your library in install/.
Please call it
Dockerfile.nameofyourlibrary
. - Document your method in algos.yaml. You may specify different lists of arguments for different tasks and datasets. The full syntax is illustrated at the beginning of algos.yaml.
- Please note that we apply black linting to all commits.
- Submit a pull request on our repository.
While debugging on your local machine, you may find the create_website script helpful. We will make sure that your code runs smoothly and re-render our website on the AWS EC2 cloud before merging your contributions. Thanks!
- install.py: builds the Docker images from install/.
- algos.yaml: lists all supported methods and parameter values.
- kernel_matrix_benchmarks/:
- definitions.py: parser for algos.yaml.
- datasets.py: supported datasets.
- main.py: runs all supported experiments on a given dataset.
- runner.py: runs a specific experiment and saves results in a HDF5 file.
- algorithms/:
- base.py: common interface for the methods included in the benchmark.
- bruteforce.py: reference implementation with NumPy.
- plotting/:
- metrics.py: supported performance metrics.
- plot_variants.py: interesting pairs of metrics for the detailed webpages.
- utils.py: computes the performance metrics and Pareto fronts.
- plot.py: renders png images.
- create_website.py: renders the website using the Jinja templates from templates/.
Open science:
- Everyone is welcome to submit pull requests with tweaks and changes to how each library is being used.
- In particular: if you are the author of one of these libraries and you think that the benchmark can be improved, please consider submitting a pull request.
- This is meant to be an ongoing project and represent the current state-of-the-art.
- We make everything easy to replicate, including the installation of the libraries and the data pre-processing.
Diverse use cases:
- We showcase challenging datasets from all fields that rely on kernel computations.
- We benchmark both pre-computation and query times.
- We support both CPU and GPU implementations.
- We try many different values of parameters for each library and ignore the points that are not on the precision-performance frontier.
- Jean Feydy, HeKA team, INRIA Paris.
- ...
We rely heavily on the template of the ANN-benchmarks website, built by Erik Bernhardsson with significant contributions from Martin Aumüller and Alexander Faithfull.
We will document our framework and results with a publication in due time. Meanwhile, the following paper details design principles behind the ANN-benchmark framework:
- M. Aumüller, E. Bernhardsson, A. Faithfull: ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. Information Systems 2019. DOI: 10.1016/j.is.2019.02.006
- ann-benchmarks is the reference website for approximate nearest neighbor search.
- big-ann-benchmarks is a benchmarking effort for billion-scale approximate nearest neighbor search as part of the NeurIPS'21 Competition track.