Skip to content

ANN-based anomaly detection for vehicle components using oscilloscope recordings.

License

Notifications You must be signed in to change notification settings

tbohne/oscillogram_classification

Repository files navigation

Oscillogram Classification

unstable License: MIT

Neural network based anomaly detection for vehicle components using oscilloscope recordings.

Example of the time series data to be considered (voltage over time - $z$-normalized):

The task comes down to binary univariate time series classification.

FCN Architecture

Note: See ResNet architecture in img/ResNet.png

Dependencies

  • for Python requirements, cf. requirements.txt
  • Apache Jena Fuseki: SPARQL server hosting / maintaining the knowledge graph

Installation

$ git clone https://github.com/tbohne/oscillogram_classification.git
$ cd oscillogram_classification/
$ pip install .

WandB Setup

$ touch config/api_key.py  # enter: wandb_api_key = "YOUR_KEY"

Config

Hyperparameter configuration in config/run_config.py, e.g.:

hyperparameter_config = {
    "batch_size": 4,
    "learning_rate": 0.001,
    "optimizer": "keras.optimizers.Adam",
    "epochs": 100,
    "model": "FCN",
    "loss_function": "binary_crossentropy",
    "accuracy_metric": "binary_accuracy",
    "trained_model_path": "best_model.h5",
    "save_best_only": True,
    "monitor": "val_loss",
    "ReduceLROnPlateau_factor": 0.5,
    "ReduceLROnPlateau_patience": 20,
    "ReduceLROnPlateau_min_lr": 0.0001,
    "EarlyStopping_patience": 50,
    "validation_split": 0.2
}

WandB sweep config in config/sweep_config.py, e.g.:

sweep_config = {
    "batch_size": {"values": [4, 16, 32]},
    "learning_rate": {"values": [0.01, 0.0001]},
    "optimizer": {"value": "keras.optimizers.Adam"},
    "epochs": {"values": [10, 30, 50, 100]},
    "model": {"values": ["FCN", "ResNet"]}
}

Select the model based on the training data

Currently supported models: FCN, ResNet, RandomForest, MLP, DecisionTree

  • If training on feature vectors (non-Euclidean data), e.g., generated by tsfresh:
    • MLP, RandomForest
  • If training on (raw) time series (Euclidean data):
    • FCN, ResNet

Usage

Preprocessing

$ python oscillogram_classification/preprocess.py --norm {none | z_norm | min_max_norm | dec_norm | log_norm} [--feature_extraction] [--feature_list] --path /DATA --type {training | validation | test}

Note: In the event of feature_extraction, in addition to the actual generated records, csv files (e.g. training_complete_features.csv) are generated, which contain the list of the features considered in each case.

Manual Feature Selection

When training the model using feature vectors, it is critical that the test, validation, and finally the application data contain the same set of features as those used for training. This can be achieved by manual feature selection, which is shown in the following example:

The training datasets were created with the --feature_extraction option, resulting in the following files:

training_complete_feature_vectors.npz
training_filtered_feature_vectors.npz
training_complete_features.csv
training_filtered_features.csv

Now the model is to be trained using the filtered features. The validation dataset should correspond to this feature selection and thus be generated as follows:

$ python oscillogram_classification/preprocess.py --norm {none | z_norm | min_max_norm | dec_norm | log_norm} --path /VALIDATION_DATA --feature_extraction --feature_list data/training_filtered_features.csv --type validation

This in turn leads to a set of files corresponding to the different feature vectors. In the described scenario, the file to be used for training would be validation_manually_filtered_feature_vectors.npz. The generation of the test dataset works analogously.

Training

$ python oscillogram_classification/train.py --train_path TRAIN_DATA.npz --val_path VAL_DATA.npz --test_path TEST_DATA.npz

Note: Before training, a consistency check is performed, which is particularly relevant for training on feature vectors. It is checked whether each of the datasets (train, test, validation) contains exactly the same features in the same order.

Class Activation / Saliency Map Generation

$ python oscillogram_classification/cam.py [--znorm] [--overlay] --method {gradcam | hirescam | tf-keras-gradcam | tf-keras-gradcam++ | tf-keras-scorecam | tf-keras-layercam | tf-keras-smoothgrad | all} --sample_path SAMPLE.csv --model_path MODEL.h5

Note: Using all as method results in a side-by-side plot of all methods.

HiResCAM Example

All Heatmap Generation Methods Side-by-Side

WandB Sweeps (Hyperparameter Optimization)

"Hyperparameter sweeps provide an organized and efficient way to conduct a battle royale of models and pick the most accurate model. They enable this by automatically searching through combinations of hyperparameter values (e.g. learning rate, batch size, number of hidden layers, optimizer type) to find the most optimal values." - wandb.ai

$ python oscillogram_classification/run_sweep.py --train_path TRAIN_DATA.npz --val_path VAL_DATA.npz --test_path TEST_DATA.npz

Clustering and Sub-ROI Patch Classification

As an alternative to the above classification of entire ROIs (Regions of Interest), we implemented another approach based on the determination of sub-regions, i.e., patches that make up the ROIs. An ROI detection algorithm provides the input for the clustering of the cropped sub-ROIs. The ROIs are divided into the following five categories for the battery signals:

The five categories are practically motivated, based on semantically meaningful regions that an expert would look at when searching for anomalies. Afterwards, the patches are clustered and for each patch type, i.e., cluster, a model is trained that classifies samples of the corresponding patch type. The following example shows the result of such a clustering, where each cluster is annotated (red) with the represented patch type from the above battery signal:

In this example, DBA k-means was able to correctly cluster 29/30 patches. The one misclassified patch actually shares many characteristics with the cluster to which it was assigned.

Results of DBA k-means:

cluster distribution: [7, 6, 6, 6, 5]
ground truth per cluster: [[1, 3, 1, 1, 1, 1, 1], [4, 4, 4, 4, 4, 4], [2, 2, 2, 2, 2, 2], [5, 5, 5, 5, 5, 5], [3, 3, 3, 3, 3]])

Clustering usage (with .csv patches):

$ python oscillogram_classification/cluster.py --norm {none | z_norm | min_max_norm | dec_norm | log_norm} --path PATH_TO_PATCHES [--clean_patches]

Using Predetermined Clusters for Comparison with Newly Recorded Samples

The idea is to compute the distance between the new time series sample and each of the predetermined cluster centroids. After computing the distances, the cluster with the smallest distance (configurable metric) is selected as the best match for the new sample.

Classify single recording with ground truth label (type: patch0):

$ python oscillogram_classification/clustering_application.py --samples SAMPLE_patch0.csv

Classify set of recordings with ground truth labels (dir of patch0 type .csv files):

$ python oscillogram_classification/clustering_application.py --samples /patch0/

Sample output:

-------------------------------------------------------------------------
test sample excerpt: [10.129, 10.137, 10.137, 10.153, 10.161, 10.153, 10.153]
best matching cluster for new sample: 0 ( [0, 2, 0, 0, 0, 0, 0] )
ground truth: 0
SUCCESS: ground truth ( 0 ) matches most prominent entry in cluster ( 0 )
-------------------------------------------------------------------------

The options without ground truth labels work equivalently, just without the patch type in the file / dir name.

$k$-NN Classification

$ python oscillogram_classification/knn.py --train_path /TRAIN_DATA --test_path /TEST_DATA --norm {none | z_norm | min_max_norm | dec_norm | log_norm}

Positive (1) and Negative (0) Sample for each Component

Normalized Battery Voltage (Engine Starting Process)

Training and Validation Accuracy of Selected Models

TBD.

Related Publications

@inproceedings{10.1145/3587259.3627546,
    author = {Bohne, Tim and Windler, Anne-Kathrin Patricia and Atzmueller, Martin},
    title = {A Neuro-Symbolic Approach for Anomaly Detection and Complex Fault Diagnosis Exemplified in the Automotive Domain},
    year = {2023},
    isbn = {9798400701412},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3587259.3627546},
    doi = {10.1145/3587259.3627546},
    booktitle = {Proceedings of the 12th Knowledge Capture Conference 2023},
    pages = {35–43},
    numpages = {9},
    location = {Pensacola, FL, USA},
    series = {K-CAP '23}
}