Skip to content

tgcsaba/KSig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KSig

GPU-accelerated computation of popular time series kernels

A scikit-learn compatible Python package, which provides a GPU-accelerated implementation for most powerful and popular time series kernels and features using CuPy.

The time series kernels included in this package are:

Available time series features are:

  • Vanilla Path Signatures computing iterated integrals of a sequence lifted to a path by piecewise linear interpolation;
  • Low-Rank Signature Features computes signature features using a low-rank algorithm which iteratively approximates outer products, see Algorithm 5;
  • Random Fourier Signature Features using Random Fourier Features and random projection approaches for tensors, see Algorithms 2 and 3;
  • Random Warping Series that computes Dynamic Time Warping alignments between input sequences and random time series, see Algorithm 1.

Introduction

The signature kernel is one of the most powerful similarity measures for sequences, which lifts a kernel on a given domain to a kernel for sequences in that domain with strong theoretical guarantees:

  • It is a universal nonlinearity for time series, which means that it is flexible enough to approximate any continuous function on compact sets of sequences.
  • Invariant to a natural transformation of time series called reparametrization (in the discrete setting often called time warping), but can be made sensitive to it by including time parametrization as an additional channel.

The signature kernel between sequences can be computed using an instantiation of the ksig.kernels.SignatureKernel (also, see ksig.kernels.SignaturePDEKernel) class by lifting a static kernel (i.e. in this case an RBF kernel for vector-valued data from the ksig.static.kernels.RBFKernel class) to a kernel for sequences as:

import numpy as np
import ksig

# Number of signature levels to use.
n_levels = 5 

# Use the RBF kernel for vector-valued data as static (base) kernel.
static_kernel = ksig.static.kernels.RBFKernel() 

# Instantiate the signature kernel, which takes as input the static kernel.
sig_kernel = ksig.kernels.SignatureKernel(n_levels, static_kernel=static_kernel)

# Generate 10 sequences of length 50 with 5 channels.
n_seq, l_seq, n_feat = 10, 50, 5 
X = np.random.randn(n_seq, l_seq, n_feat)

# Sequence kernels take as input an array of sequences of ndim == 3,
# and work as a callable for computing the kernel matrix. 
K_XX = sig_kernel(X)  # K_XX has shape (10, 10).

# The diagonal kernel entries can also be computed.
K_X = sig_kernel(X, diag=True)  # K_X has shape (10,).

# Generate another array of 8 sequences of length 20 and 5 features.
n_seq2, l_seq2 = 8, 20
Y = np.random.randn(n_seq2, l_seq2, n_feat)

# Compute the kernel matrix between arrays X and Y.
K_XY = sig_kernel(X, Y)  # K_XY has shape (10, 8)

Installation

We recommend setting up a fresh conda environment with Python 3.9/10 using

conda create -n ksig python=3.10 
conda activate ksig

In order to build cupy when installing the package, the CUDA_PATH environment variable must be properly set. The default CUDA installation path on Linux is /usr/local/cuda-${CUDA-VERSION}/, while on Windows it is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v${CUDA_VERSION}\ where ${CUDA_VERSION} is for example 12.3. This can be done on Linux by (substitute for ${CUDA_VERSION}):

export CUDA_PATH=/usr/local/cuda-${CUDA-VERSION}/

Once this is done, the package can be installed by

pip install git+https://github.com/tgcsaba/ksig.git

Alternatively, if you are still unable to build cupy, you may be able to download a precompiled wheel for certain CUDA versions from here. Once this is done, you may try installing ksig package again using the previous line.

Finally, to make use of the functionality of the provided models under ksig.models, i.e. PrecomputedKernelSVC and PrecomputedFeatureLinSVC, you also need to install the cuml library. This can only be manually installed from RAPIDS repository. Here, scroll down to the section titled "Install RAPIDS" and select the pip install method, your CUDA version, your Python version, then select "Choose Specific Packages" and unselect every package besides cuml, and copy-paste the given command into the terminal where the conda environment is activated.

More Examples

Scaling to large datasets

A computational bottleneck associated with the full-rank signature kernel is a joint quadratic complexity in the number of training examples (N) and the length of the sequences (L), i.e. O(N2·L2). Feature-based formulations of the signature kernel get around this issue by being able to represent the kernel using a finite-dimensional feature set.

One such example is Random Fourier Signature Features, which provides an unbiased approximation to the signature kernel, that converges in probability (1/M)-subexponentilly fast, where $M \in \mathbb{Z}_+$ is the truncation level.

This can be constructed using the ksig.kernels.SignatureFeatures class with two ingredients:

  • setting the static_features argument to an instance of the ksig.static.features.RandomFourierFeatures class,
  • and the projection argument to an instance of ksig.projections.TensorizedRandomProjection (for RFSF-TRP) or to ksig.projections.DiagonalProjection (for RFSF-DP).
import numpy as np
import ksig

# Number of signature levels to use.
n_levels = 5 

# Use 100 components in RFF and projection.
n_components = 100

# Instantiate RFF feature map.
static_feat = ksig.static.features.RandomFourierFeatures(n_components=n_components)
# Instantiate tensor random projections.
proj = ksig.projections.TensorizedRandomProjection(n_components=n_components)

# The RFSF-TRP feature map and kernel. Additionally to working as a callable for
# computing a kernel, it implements a fit and a transform method.
rfsf_trp_kernel = ksig.kernels.SignatureFetures(
    n_levels=n_levels, static_features=static_feat, projection=proj)

# Generate 1000 sequences of length 200 with 100 features.
n_seq, l_seq, n_feat = 1000, 200, 100
X = np.random.randn(n_seq, l_seq, n_feat)

# Fit the kernel to the data.
rfsf_trp_kernel.fit(X)

# Compute the kernel matrix as before.
K_XX = rfsf_trp_kernel(X)  # K_XX has shape (1000, 1000).

# GEnerate another array of 800 sequences of length 250 and 100 features.
n_seq2, l_seq2 = 800, 250
Y = np.random.randn(n_seq2, l_seq2, n_feat)

# Compute the kernel matrix between X and Y.
# The kernel does not have to be fitted a second time.
K_XY = rfsf_trp_kernel(X, Y)  # K_XY has shape (1000, 800)

# Alternatively, we may compute features separately for X and Y. Under the hood,
# this is what the call method does, i.e. compute features and take their inner product.
P_X = rfsf_trp_kernel.transform(X)  # P_X has shape (1000, 501)
P_Y = rfsf_trp_kernel.transform(Y)  # P_Y shape shape (800, 501)

# Check that the results match.
print(np.linalg.norm(K_XX - P_X @ P_X.T))
print(np.linalg.norm(K_XY - P_X @ P_Y.T))

Results

The experiments can be run using the code provided in the experiments directory. The results displayed here are the ones appearing on here. We have done our best efforts to aid in reproducibility by using run dependent seeds, however, we have also observed minor differences across CUDA and cupy versions in the produced results.

Results on Multivariate UEA Datasets with ($N \leq 1000$):

RFSF-DP RFSF-TRP KSig KSigPDE RWS GAK RBF RFF
ArticularyWordRecognition 0.984 0.981 0.990 0.983 0.987 0.977 0.977 0.978
AtrialFibrillation 0.373 0.320 0.400 0.333 0.427 0.333 0.267 0.373
BasicMotions 1.000 1.000 1.000 1.000 0.995 1.000 0.975 0.860
Cricket 0.964 0.964 0.958 0.972 0.978 0.944 0.917 0.886
DuckDuckGeese 0.636 0.664 0.700 0.480 0.492 0.500 0.420 0.372
ERing 0.921 0.936 0.841 0.941 0.945 0.926 0.937 0.915
EigenWorms 0.817 0.837 0.809 0.794 0.623 0.511 0.496 0.443
Epilepsy 0.949 0.942 0.949 0.891 0.925 0.870 0.891 0.777
EthanolConcentration 0.457 0.439 0.479 0.460 0.284 0.361 0.346 0.325
FingerMovements 0.608 0.624 0.640 0.630 0.612 0.500 0.620 0.570
HandMovementDirection 0.573 0.568 0.595 0.527 0.403 0.595 0.541 0.454
Handwriting 0.434 0.400 0.479 0.409 0.591 0.481 0.307 0.249
Heartbeat 0.717 0.712 0.712 0.722 0.714 0.717 0.717 0.721
JapaneseVowels 0.978 0.978 0.986 0.986 0.955 0.981 0.981 0.979
Libras 0.898 0.928 0.922 0.894 0.837 0.767 0.800 0.800
MotorImagery 0.516 0.526 0.500 0.500 0.508 0.470 0.500 0.482
NATOPS 0.906 0.908 0.922 0.928 0.924 0.922 0.917 0.900
PEMS-SF 0.800 0.808 0.827 0.838 0.701 0.855 0.855 0.770
RacketSports 0.874 0.861 0.921 0.908 0.878 0.849 0.809 0.755
SelfRegulationSCP1 0.868 0.856 0.904 0.904 0.829 0.915 0.898 0.885
SelfRegulationSCP2 0.489 0.510 0.539 0.544 0.481 0.511 0.439 0.492
StandWalkJump 0.387 0.333 0.400 0.400 0.347 0.267 0.533 0.267
UWaveGestureLibrary 0.882 0.881 0.912 0.866 0.897 0.887 0.766 0.846
Avg.acc. 0.740 0.738 0.756 0.735 0.710 0.702 0.692 0.656
Avg.rank 3.652 3.739 2.348 2.957 4.043 4.217 4.913 5.957

Results on Multivariate UEA Datasets with ($N > 1000$), fNIRMS2MW ($N = 10^5$), SITS11M ($N = 10^6$):

RFSF-DP RFSF-TRP RWS RFF
CharacterTrajectories 0.990 0.990 0.991 0.989
FaceDetection 0.653 0.656 0.642 0.572
InsectWingbeat 0.436 0.459 0.227 0.341
LSST 0.589 0.624 0.631 0.423
PenDigits 0.983 0.982 0.989 0.980
PhonemeSpectra 0.204 0.204 0.205 0.083
SITS1M 0.745 0.740 0.610 0.718
SpokenArabicDigits 0.981 0.980 0.981 0.964
fNIRS2MW 0.659 0.658 0.621 0.642
Avg.acc. 0.693 0.699 0.655 0.635
Avg.rank 1.778 1.889 2.222 3.333

Documentation

Coming very soon...

Literature

A non-exhaustive list of some papers that involve signature kernels in one way or another (in chronological order):

Contact us if you woud like your paper or project to appear here.

Contacts

For queries or feedback, you can reach us at csaba.toth@maths.ox.ac.uk and harald.oberhauser@maths.ox.ac.uk.

About

A scikit-learn compatible Python package for GPU-accelerated computation of the signature kernel using CuPy.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published