Skip to content

FilippoMB/Time-series-classification-and-clustering-with-Reservoir-Computing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv Downloads

Framework overview

This library allows to quickly implement different architectures for time series data based on Reservoir Computing (RC), the family of approaches popularized in machine learning by Echo State Networks. This library is primarly design to perform classification and clustering of both univariate and multivariate time series. However, it can also be used to perform time series forecasting.

Classification

Several options are available to customize the RC model, by selecting different configurations for each module.

  1. The reservoir module specifies the reservoir configuration (e.g., bidirectional, leaky neurons, circle topology). Given a multivariate time series $\mathbf{X}$ it generates a sequence of the same length of Reservoir states $\mathbf{H}$.
  2. The dimensionality reduction module (optionally) applies a dimensionality reduction on the sequence of the reservoir's states $\mathbf{H}$ generating a new sequence $\mathbf{\bar H}$.
  3. The representation generates a vector $\mathbf{r}_\mathbf{X}$ from the sequence of reservoir's states, which represents in vector form the original time series $\mathbf{X}$.
  4. The readout module is a classifier that maps the representation $\mathbf{r}_\mathbf{X}$ into the class label $\mathbf{y}$, associated with the time series $\mathbf{X}$.

This library implements the reservoir model space a very powerful representation $\mathbf{r}_\mathbf{X}$ for the time series. Details about the methodology are found in the original paper.

Clustering

The representation $\mathbf{r}_\mathbf{X}$ obtained at step 3 can be used to perform time series clustering.

Forecasting

The sequences $\mathbf{H}$ and $\mathbf{\bar H}$ obtained at steps 1 and 2 can be directly used to forecast the future values of the time series.

Installation

The recommended installation is with pip:

pip install reservoir-computing

Alternatively, you can install the library from source:

git clone https://github.com/FilippoMB/Time-series-classification-and-clustering-with-Reservoir-Computing.git
cd Time-series-classification-and-clustering-with-Reservoir-Computing
pip install -e .

Quick start

The following scripts provide minimalistic examples that illustrate how to use the library for different tasks.

To run them, download the project and cd to the root folder:

git clone https://github.com/FilippoMB/Time-series-classification-and-clustering-with-Reservoir-Computing.git
cd Time-series-classification-and-clustering-with-Reservoir-Computing

Classification

python examples/classification_example.py

Clustering

python examples/clustering_example.py

Forecasting

python examples/forecasting_example.py

The following notebooks illustrate more advanced use-cases.

  • Perform dimensionality reduction, cluster analysis, and visualize the results: view or Open In Colab
  • Probabilistic forecasting with advanced regression models as readout: view or Open In Colab
  • Use advanced classifiers as readout: view or Open In Colab

Configure the RC-model

The main class RC_model contained in modules.py permits to specify, train and test an RC-model. The RC-model is configured by passing to the constructor of the class RC_model a set of parameters. To get an idea, you can check classification_example.py or clustering_example.py where the parameters are specified through a dictionary (config).

The available configuration hyperparameters are listed in the following and, for the sake of clarity, are grouped according to which module of the architecture they refer to.

1. Reservoir:

  • n_drop - number of transient states to drop
  • bidir - use a bidirectional reservoir (True or False)
  • reservoir - precomputed reservoir (object of class Reservoir in reservoir.py; if None, the following hyperparameters must be specified:
    • n_internal_units = number of processing units in the reservoir
    • spectral_radius = largest eigenvalue of the reservoir matrix of connection weights (to guarantee the Echo State Property, set spectral_radius <= leak <= 1)
    • leak = amount of leakage in the reservoir state update (optional, None or 1.0 --> no leakage)
    • circ = if True, generate a determinisitc reservoir with circle topology where each connection has the same weight
    • connectivity = percentage of nonzero connection weights (ignored if circ = True)
    • input_scaling = scaling of the input connection weights (note that weights are randomly drawn from {-1,1})
    • noise_level = deviation of the Gaussian noise injected in the state update

2. Dimensionality reduction:

  • dimred_method - procedure for reducing the number of features in the sequence of reservoir states; possible options are: None (no dimensionality reduction), 'pca' (standard PCA) or 'tenpca' (tensorial PCA for multivariate time series data)
  • n_dim - number of resulting dimensions after the dimensionality reduction procedure

3. Representation:

  • mts_rep - type of multivariate time series representation. It can be 'last' (last state), 'mean' (mean of all states), 'output' (output model space), or 'reservoir' (reservoir model space)
  • w_ridge_embedding - regularization parameter of the ridge regression in the output model space and reservoir model space representation; ignored if mts_rep is None

4. Readout:

  • readout_type - type of readout used for classification. It can be 'lin' (ridge regression), 'mlp' (multilayer perceptron), 'svm' (support vector machine), or None. If None, the input representations will be stored in the .input_repr attribute: this is useful for clustering and visualization. Also, if None, the other Readout hyperparameters can be left unspecified.
  • w_ridge - regularization parameter of the ridge regression readout (only when readout_type is 'lin')
  • mlp_layout - list with the sizes of MLP layers, e.g. [20,20,10] defines a MLP with 3 layers of 20, 20 and 10 units respectively (only when readout_type is 'mlp')
  • batch_size - size of the mini batches used during training (only when readout_type is 'mlp')
  • num_epochs - number of iterations during the optimization (only when readout_type is 'mlp')
  • w_l2 = weight of the L2 regularization (only when readout_type is 'mlp')
  • learning_rate = learning rate in the gradient descent optimization (only when readout_type is 'mlp')
  • nonlinearity = type of activation function; it can be {'relu', 'tanh', 'logistic', 'identity'} (only when readout_type is 'mlp')
  • svm_gamma = bandwith of the RBF kernel (only when readout_type is 'svm')
  • svm_C = regularization for the SVM hyperplane (only when readout_type is 'svm')

RC-model for classification

The training and test function requires in input training and test data, which must be provided as multidimensional NumPy arrays of shape [N,T,V], with:

  • N = number of samples
  • T = number of time steps in each sample
  • V = number of variables in each sample

Training and test labels (Y and Yte) must be provided in one-hot encoding format, i.e. a matrix [N,C], where C is the number of classes.

Training

from reservoir_computing.modules import RC_model
clf = RC_model()
clf.fit(Xtr, Ytr)

Inputs:

  • Xtr, Ytr: training data and labels.

Outputs:

  • None

Prediction of new samples

Yhat = clf.predict(Xte)

Inputs:

  • Xte: test data.

Outputs:

  • Yhat: prediction of the labels for the test data.

RC-model for clustering

As in the case of classification, the data must be provided as multidimensional NumPy arrays of shape [N,T,V]

Training

from reservoir_computing.modules import RC_model
clst = RC_model(readout_type=None)
clst.fit(X)
rX = clst.input_repr # representations of the input data

Inputs:

  • X: time series data

Outputs:

  • None

The representations rX can be used to perfrom clustering using traditional clustering algorithms for vectorial data, such as those here.

RC-model for forecasting

Training

from reservoir_computing.modules import RC_forecaster
fcst = RC_forecaster()
fcst.fit(Xtr, Ytr)

Inputs:

  • Xtr, Ytr: current and future values used for training.

Outputs:

  • None

Predicting new data

Yhat = fcst.predict(Xte)

Inputs:

  • Xte: test data.

Outputs:

  • Yhat: forecast of the test data.

Time series datasets for classification and clustering

  • A collection of univariate and multivariate time series dataset is available for download here.
  • The dataset are provided both in MATLAB and Python (Numpy) format.
  • The original raw data come from UCI, UEA, and UCR public repositories.

Citation

Please, consider citing the original paper if you are using this library in your reasearch

@article{bianchi2020reservoir,
  title={Reservoir computing approaches for representation and classification of multivariate time series},
  author={Bianchi, Filippo Maria and Scardapane, Simone and L{\o}kse, Sigurd and Jenssen, Robert},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2020},
  publisher={IEEE}
}

License

The code is released under the MIT License. See the attached LICENSE file.

About

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published