Random Features for Polynomial Kernels and Dot Product Kernels

This repository contains PyTorch implementations of various random feature maps for polynomial and general dot product kernels. In particular, we provide implementations using complex random projections that can improve the kernel approximation significantly. For dot product kernels, we provide an algorithm that minimizes the variance of the random feature approximation and can also be used in conjunction with the Gaussian kernel.

The basic building block of random features for dot product kernels are Polynomial Sketches that approximate the polynomial kernel of degree p. Such random projections can be seen as linear random projections applied to p-times self-tensored input features. If p=1, these sketches reduce to standard linear random projections.

For more information and a description of the novelties of this work, consult the associated technical reports:

@article{wacker2022a,
  title={Improved Random Features for Dot Product Kernels},
  author={Wacker, Jonas and Kanagawa, Motonobu and Filippone, Maurizio},
  journal={arXiv preprint arXiv:2201.08712},
  year={2022}
}
@article{wacker2022b,
  title={Complex-to-Real Random Features for Polynomial Kernels},
  author={Wacker, Jonas and Ohana, Ruben and Filippone, Maurizio},
  journal={arXiv preprint arXiv:2202.02031},
  year={2022}
}
@article{wacker2022c,
  title={Local Random Feature Approximations of the Gaussian Kernel},
  author={Wacker, Jonas and Filippone, Maurizio},
  journal={arXiv preprint arXiv:2204.05667},
  year={2022}
}

Please cite these works if you find them useful.

In addition, this repository implements and extends works from the following related papers:

Requirements

We recommend:

Python 3.6 or higher
PyTorch 1.8 or higher
For GPU support: CUDA 9.0 or higher

Multiple complex operations on tensors that we use in our code have been introduced in PyTorch version 1.8. In case you have a lower version, you cannot use the complex projections in our code.

A note on using GPU-accelerated SRHT sketches

PyTorch does not natively support the Fast Walsh Hadamard Transform. This repository contains an implementation including a CUDA kernel in util/hadamard_cuda. This kernel is needed for GPU-accelerated TensorSRHT sketches. If you want to use this implementation, you need to specify your cuda path inside util/fwht/__init__.py for its compilation:

if torch.cuda.is_available():
    sources.extend([filedir + 'fwht_kernel.cu'])
    flags.extend(['-DIS_CUDA_AVAILABLE'])
    if os.environ.get('CUDA_HOME', '-1') == '-1':
        warnings.warn('CUDA_HOME variable not set. Setting CUDA_HOME=/usr/local/cuda-9.0...',)
        os.environ['CUDA_HOME'] = '/usr/local/cuda-9.0'

We used cuda version 9.0 in our experiments but other versions should work too.

Getting started

Polynomial Sketches

A Polynomial Sketch is a random feature approximation of the polynomial kernel. It can be used on its own or as a module inside a larger deep learning framework. It is initialized as follows:

from random_features.polynomial_sketch import PolynomialSketch
feature_encoder = PolynomialSketch(
    input_dimension, # data input dimension (power of 2 for srht projection_type)
    projection_dimension, # output dimension of the random sketch
    degree=degree, # degree of the polynomial kernel
    bias=bias, # bias parameter of the polynomial kernel
    lengthscale=lengthscale, # inverse scale of the data (like lengthscale for Gaussian kernel)
    projection_type='gaussian'/'rademacher'/'srht'/'countsketch_scatter',
    complex_weights=False/True, # whether to use complex random projections (without complex_real outputs are complex-valued)
    complex_real=False/True, # whether to use complex-to-real sketches (outputs are real-valued)
    hierarchical=False/True, # whether to use hierarchical sketching as proposed in <https://arxiv.org/abs/1909.01410>
    device='cpu'/'cuda', # whether to use CPU or GPU
)

feature_encoder.resample() # initialize random feature sample
feature_encoder.forward(input_data) # project input data

projection_type has a strong impact on the approximation quality and computation speed. projection_type='srht' uses the subsampled randomized Hadamard transform that makes use of structured matrix products. These are faster (especially on the GPU). They also give lower variances than Rademacher and Gaussian sketches most of the time. countsketch_scatter uses CountSketches as the base projection to yield TensorSketch (https://chbrown.github.io/kdd-2013-usb/kdd/p239.pdf).

Depending on the scaling of your data, high-degree sketches can give very large variances. complex_weights usually improve the approximation significantly in this case. hierarchical sketches (https://arxiv.org/abs/1909.01410) are not the focus of this work, but can be helpful too. Complex random features can be computed about as fast as real ones, in particular when using projection_type='srht'. However, the downstream task is usually more expensive when working with complex data. Complex-to-Real (CtR) (complex_real=True) sketches return real-valued random features instead and thus alleviate this problem.

Spherical Random Features (SRF)

We also provide an implementation of Spherical Random Features for polynomial kernels. They can be run similar to the above code but the random feature distribution needs to be optimized first. Please have a look at the code at the bottom of random_features/spherical.py to understand the procedure for obtaining this distribution and the random feature map afterwards. Once the distribution is obtained, it is saved under saved_models (some parameterizations are contained in this repository already).

Approximating the Gaussian kernel using Polynomial Sketches

The Gaussian kernel can be approximated well through a randomized Maclaurin expansion with Polynomial Sketches assuming that the data is zero-centered and a proper lengthscale is used.

Note: This method is not implemented for Complex-to-Real (CtR) sketches yet.

The approximator is initialized as follows:

from random_features.gaussian_approximator import GaussianApproximator
feature_encoder = GaussianApproximator(
    input_dimension, # data input dimension (power of 2 for srht projection_type)
    projection_dimension, # output dimension of the random sketch
    approx_degree=10, # maximum degree of the taylor approximation
    lengthscale=lengthscale, # lengthscale of the Gaussian kernel
    var=kernel_var, # variance of the Gaussian kernel
    method='maclaurin', # approximation method (other possibilities: rff/poly_sketch)
    projection_type='gaussian'/'rademacher'/'srht'/'countsketch_scatter',
    complex_weights=False/True, # whether to use complex random projections (without complex_real outputs are complex-valued)
    hierarchical=False/True, # whether to use hierarchical sketching as proposed in <https://arxiv.org/abs/1909.01410>
    device='cpu'/'cuda', # whether to use CPU or GPU
)

# find optimal sampling distribution when using optimized maclaurin
feature_encoder.initialize_sampling_distribution(input_data_sample, min_sampling_degree=2)
feature_encoder.resample() # initialize random feature sample
feature_encoder.forward(input_data) # project input data

Here, we chose the optimized Maclaurin method (method='maclaurin'). initialize_sampling_distribution finds the optimal random feature distribtion between degrees min_sampling_degree=2 and approx_degree=10. method='rff' (random Fourier features) and method='poly_sketch' do not require this step. method='maclaurin_p' gives the Maclaurin approximation according to (Kar and Karnick, 2012), which is less optimal than the optimized Maclaurin method but requires no preprocessing.

Localized random features for Gaussian kernels with short length scales

Approximating the Gaussian kernel on high-frequency data, tipycally modelled using short length scales, is very challenging for random feature methods. The local_approximations folder contains implementations where the above Maclaurin approximations are centered around individual test points to improve results considerably in such scenarios. These implementations follow the work of (Wacker, 2022c) and serve to reproduce the experiments in the original paper.

Approximating other Dot Product Kernels using the Maclaurin method

The GaussianApproximator module above is a thin wrapper of the Maclaurin module contained in random_features.maclaurin. When approximating a general dot product kernel, e.g., the exponential dot product kernel, we do not need this wrapper, but use the Maclaurin module directly. Follow the code inside GaussianApproximator to see how the Maclaurin module is used.

Reproducing the Gaussian Process Classification/Regression experiments

If you would like to reproduce the results from the experiments in our paper, download the desired datasets from the UCI machine learning repository. The code_rna dataset is available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html. Then load the associated CSV files and save them using PyTorch. The dataset paths are in config/datasets and the datasets to be used in the experiments are in config/active_datasets.json.

Then run the following command:

python run_rf_gp_experiments.py --rf_parameter_file [rf_parameter_config] --datasets_file config/active_datasets.json --use_gpu

--use_gpu is optional. [rf_parameter_config] needs to be replaced by one of the files located in config/rf_parameters depending on which kernel should be approximated. These files can be easily adapted according to your needs.

The output logs of the experiments are saved in the csv and logs folder.

The bar plots can be created from the csv files using the Jupyter notebook notebooks/bar-plot-visualization.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
config		config
csv		csv
figures		figures
likelihoods		likelihoods
local_approximations		local_approximations
models		models
notebooks		notebooks
random_features		random_features
saved_models		saved_models
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bound_verification.py		bound_verification.py
p_opt_comparison.py		p_opt_comparison.py
plot_variance_vs_degree.py		plot_variance_vs_degree.py
run_fine_grained.py		run_fine_grained.py
run_kid.py		run_kid.py
run_real_comp_vs_p.py		run_real_comp_vs_p.py
run_rf_gp_experiments.py		run_rf_gp_experiments.py
run_rf_gps.sh		run_rf_gps.sh
run_speed_vs_kernel_approx.py		run_speed_vs_kernel_approx.py
run_vi_gp.py		run_vi_gp.py
sparsity_experiment.py		sparsity_experiment.py
verify_variances.py		verify_variances.py

License

joneswack/dp-rfs

Folders and files

Latest commit

History

Repository files navigation

Random Features for Polynomial Kernels and Dot Product Kernels

Requirements

A note on using GPU-accelerated SRHT sketches

Getting started

Polynomial Sketches

Spherical Random Features (SRF)

Approximating the Gaussian kernel using Polynomial Sketches

Localized random features for Gaussian kernels with short length scales

Approximating other Dot Product Kernels using the Maclaurin method

Reproducing the Gaussian Process Classification/Regression experiments

About

Resources

License

Stars

Watchers

Forks

Languages