Intended users

Myself and close collaborators.
Other statistics and ML researchers, data scientists and engineers.

Primarly for research and development, but also viable for production use at some point.

Wanted features

Overall principles

Computational performance: numba.
Extendability: Implement non-restrictive base classes, such that new types of algorithms can easily be implemented without being forced into a template it doesn't fit.
Composition: Easy to combine algorithms. Pipelining, ensembling, etc.
Interoperability: Easy to use with other libraries. For example, it is great if meta algorithms like pipelines from other libraries can be used with skchange.

Allways keep in mind

Scikit-learn design principles https://arxiv.org/pdf/1309.0238.pdf
sktime design: https://arxiv.org/abs/2101.04938

Data

Univariate time series: A single variable over time.
Multiple time series: A collection of unrelated univariate time series.
Multivariate time series: A collection of related univariate time series.

Use pd.DataFrame for all types of input data.

Anomaly detectors

Finds point or collective anomalies in data.

Change detectors

Segments data into homogenous segments. Overall goal is to annotate data with labels that indicate what segment it belongs to. No restrictions on how this is achieved.

General detectors requirements and features

Both testing/scoring based, cost based, or anything else.
Various types of thresholds/penalties.
Subset anomalies.
Possibility to tune number of detections in a general way, across both change and anomaly detectors.
Should be possible to add specialised tuning procedures for each algorithm or subclasses:

a. Add option for .tune() method per algorithm. Plays poorly with pipelines and composition in general (?).

b. Add specialised tuning classes with a detector as a component (like CV in sklearn and sktime) than can only be used based on (i) a tag or (ii) inheritance.

c. Add a tune_penalty = True/False or penalty = tune, which governs what happens in .fit().
Add .show() method to visualise results?
Option to implement quick updating of fit and predict with new data, without having to retrain the entire model. But fallback on retraining the entire model.

Nice-to-have features

A wrapper for turning av detector for univariate time series to a detector for multiple time series.
An aggregator that aggregates scores from multiple detectors into a single score.
Make a wrapper for diagnosing anomalies post detection? Or pipeline step?
Ability to set up a pipeline of model -> drift adaptation -> anomaly detection

Why depend on sktime?

Tidy interface.
Clear purpose of base class.
Clarity of design principles.

a. https://arxiv.org/pdf/1309.0238.pdf

b. https://arxiv.org/abs/2101.04938
Well documented.
BaseAnnotator is a non-restrictive base class for change detection algorithms. Avoids the need to implement a lot of boilerplate code.
Several useful meta algorithms are already implemented in sktime. For example, pipelines.

Related packages

sktime: https://www.sktime.net/en/stable/
darts: https://unit8co.github.io/darts/
nixtla: https://github.com/Nixtla

See https://www.sktime.net/en/stable/related_software.html for a more complete list.

Development workflow

Create a new algorithm by using the annotator extension template of sktime: https://github.com/sktime/sktime/blob/df21a0c0275ebf28deb30efac5d469c9f0d178e3/extension_templates/annotation.py.
Explore the new algorithm class, its method and its component functions in an interactive script. These scripts are located in the interactive folder and named explore_<algorithm>.py, for example explore_pelt.py, if the algorithm is pelt. In the future, these explorative scripts might be run as part of the CI/CD pipeline.
Write pytests in the relevant folders' tests subfolder. If the algorithm is named pelt and located in skchange/change_detection/pelt.py, write tests in skchange/change_detection/tests/test_pelt.py.

More resources: https://www.sktime.net/en/stable/developer_guide/add_estimators.html

Coding standards: https://github.com/sktime/sktime/blob/df21a0c0275ebf28deb30efac5d469c9f0d178e3/docs/source/developer_guide/coding_standards.rst#L65

Release workflow

For releasing a new version of skchange, run the do-nothing script build_tools/make_release.py for instructions. See the script for more information.

Roadmap

Implement PELT as first test implementation of an algorithm.
Implement seeded binary segmentation with CUSUM as a second test implementation of an algorithm.
Implement CAPA.
Complete first version of README.
Complete the make_release do-nothing script. Also?? https://stackoverflow.com/questions/72270892/git-versioning-with-setuptools-in-pyproject-toml
Publish to PyPI? When this is done, add to make_release script.
Add automatic documentation generation by Sphinx and readthedocs: https://eikonomega.medium.com/getting-started-with-sphinx-autodoc-part-1-2cebbbca5365. Get access to skchange from readthedocs. Add documentation generation to make_release script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTES.md

NOTES.md

Intended users

Wanted features

Overall principles

Data

Anomaly detectors

Change detectors

General detectors requirements and features

Nice-to-have features

Why depend on sktime?

Related packages

Development workflow

Release workflow

Roadmap

Files

NOTES.md

Latest commit

History

NOTES.md

File metadata and controls

Intended users

Wanted features

Overall principles

Data

Anomaly detectors

Change detectors

General detectors requirements and features

Nice-to-have features

Why depend on sktime?

Related packages

Development workflow

Release workflow

Roadmap