Skip to content

The source code of my master thesis, which topic was: "Hybrid neural networks for anomaly detection in cyber-physical systems". It was made at LORIA in Nancy, France during my Erasmus Exchange.

Notifications You must be signed in to change notification settings

hadrrb/ml-for-anomaly-detection

Repository files navigation

Master Thesis: Hybrid neural networks for anomaly detection in cyber-physical systems

This repository contains the source code used to produce the results for the master thesis (in Python3) in the main directory and the source code of the master thesis itself (in LaTeX) in thesis folder.

Abstract

Nowadays cyber-physical systems are widely used in different application domains. In parallel, machine learning algorithms are used widely to detect the anomalies in the behaviour of these systems. However, this detection is limited to two states: normal behaviour and faulty functioning. This master thesis aims to extend this detection to differentiate between attacks and normal faults. In first place, a power system is described as an example to work on. Then, various machine learning algorithms are evaluated on the given datasets, and this using two machine learning toolkits - scikit-learn and Weka. Later, various tools for feature analysis are presented and an algorithm to find the features that contributed the most into the false predictions is described. Finally, three solutions to the initial problem are presented and evaluated.

Repository content description

The integral text of the master thesis can be found in this pdf file. Below is presented the source code used for each of the chapters in the thesis.

Chapter 2: Power system as a CPS example

  • files_calc.ipnyb: conversion of dataset from .arff to .csv and analysis of distribution of classes throughout files.

Chapter 3: Machine learning algorithms comparison

  • ai_all.py: script to calculate comparison metrics values for all the classifiers for the 3 available datasets (multiclass, binary, three classes). As output il creates pickle files containing the results to be processed afterwards,
  • plot.ipynb: tool for creating plots for all comparison metrics using pickle files created by the previous script,
  • roc.py: a script to create roc curves for classifiers running on binary data (not displayed in the thesis),
  • ai.py: legacy script for calculate comparison metrics values for all the classifiers for 3 class dataset. It creates also the ROC curve and the confusion matrix,
  • ai_binary.py: legacy script for calculate comparison metrics values for all the classifiers for binary dataset,
  • ai_multiclass.py: legacy script for calculate comparison metrics values for all the classifiers for multiclass dataset,
  • proc.py: script for converting csv to arff in order to run tests in WEKA,
  • plotting.py: legacy script for creating plots for all comparison metrics,
  • param_optim.ipynb: script for finding the best set of parameters for the discussed classifiers.

Chapter 4: Features' importance

Chapter 5: Model enhancement

  • featfun.ipynb: attempt to enhance the predictions of Decision Tree classifier,
  • featfun_rf.ipynb: attempt to enhance the predictions of Random Forest classifier,
  • featfun_mlp.ipynb: attempt to enhance the predictions of Multilayer Perceptron classifier,

About

The source code of my master thesis, which topic was: "Hybrid neural networks for anomaly detection in cyber-physical systems". It was made at LORIA in Nancy, France during my Erasmus Exchange.

Topics

Resources

Stars

Watchers

Forks