Skip to content

EigenPro/EigenPro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EigenPro

EigenPro [1-3] is a GPU-enabled fast and scalable solver for training kernel machines. It applies a projected stochastic gradient method with dual preconditioning to enable major speed-ups. It is currently based on a PyTorch backend.

Highlights

  • Fast: EigenPro is the fastest kernel method at large scale.
  • Plug-and-play: Our method learns a quality model with little hyper-parameter tuning in most cases.
  • Scalable: The training time of one epoch is nearly linear in both model size and data size. This is the first kernel method that achieves such scalability without any compromise on testing performance.

Coming Soon

  • Support for multi-GPU and model-parallelism: We are adding support for multiple GPUs and model-parallelism.

Usage

Installation

pip install git+ssh://git@github.com/EigenPro/EigenPro.git@main

Run Example

Linux:

bash examples/run_fmnist.sh

Windows:

examples\run_fmnist.bat

Jupyter Notebook: examples/notebook.ipynb

See files under examples/ for more details.

Empirical Results

In the experiments described below, P denotes the number of centers (model size), essentially representing the model size, while 'd' signifies the ambient dimension. For all experiments, a Laplacian kernel with a bandwidth of 20.0 was employed.

1. CIFAR5M Extracted Features on single GPU

We used extracted features from the pretrained 'mobilenet-2' network available in the timm library. The benchmarks processed the full 5 million samples of CIFAR5M with d = 1280 for one epoch for two versions of EigenPro and FALKON [4-6]. All of these experiments were run on a single A100 GPU. The maximum RAM we had access to was 1.2TB, which was not sufficient for FALKON with 1M centers.

Example Image

2. Libri?Speech Extracted Features on single GPU

We used 10 million samples with d = 1024 for one epoch for two versions of EigenPro and FALKON. All of these experiments were run on a single V100 GPU. The maximum RAM available for this experiment was 300GB, which was not sufficient for FALKON with more than 128K centers. The features are extracted using an acoustic model (a VGG+BLSTM architecture in [7]) to align the length of audio and text.

Example Image


References

  1. Abedsoltan, Amirhesam and Belkin, Mikhail and Pandit, Parthe, "Toward Large Kernel Models," Proceedings of the 40th International Conference on Machine Learning, ICML'23, JMLR.org, 2023. Link
  2. Siyuan Ma, Mikhail Belkin, "Kernel machines that adapt to GPUs for effective large batch training," Proceedings of the 2nd SysMLConference, 2019. Link
  3. Siyuan Ma, Mikhail Belkin, "Diving into the shallows: a computational perspective on large-scale shallow learning," Advances in Neural Information Processing Systems 30 (NeurIPS 2017). Link
  4. Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi, “Kernel methods through the roof: handling billions of points efficiently,” Advances in Neural Information Processing Systems, 2020. Link
  5. Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco, “FALKON: An optimal large scale kernel method,” Advances in Neural Information Processing Systems, 2017. Link
  6. Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi, “Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses,” Advances in Neural Information Processing Systems, 2019. Link
  7. Hui, L. and Belkin, M. "Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks." In International Conference on Learning Representations, 2021. Link

Cite us

About

Latest and fastest EigenPro that scales to billions of examples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages