Variational-AutoEncoder For Novelty Detection

Abstract

Using a Variational AutoEncoder[1], the generation process of data will be learned.

Using mean squared error, the difference between original data and the reconstrued one will be calculated and used to determine a threshold.

That threshold will be used to discriminate the regular data from the novelties ones.

Finally the results will be compared with the OneClass-SVM[2].

Dataset

For the dataset has been used EMNIST-Letters[3] a set of 26 balanced classes (from A to Z) composed by images with 28x28 pixels.

To simulate a novelty dataset has been added to the first class (A) some examples taken from the others classes to achieve about 3% of impurities both in Train set and Test set.

Experiment Details

Has been trained different autoencoders changing :

Reparametrization Trick sizes (see model bottom)
L2 regularization values for convolutional and dense layers
dropout values for dropout layers (see model.py)

The threshold has been chosen (in the Train set) from the ordered mean squared errors vector such that there are about 3% elements greater than the threshold.

Model

Results

Reconstruction error by changing R. Trick size

Reconstruction error by changing Regularization Values (R. Trick = 32)

Reconstruction error by changing Dropout Values (R. Trick = 32, Regularization = 0.001)

Best Reconstruction Losses

r. trick	reg.	dropout	loss
32	1e-05	0.1	144.07
32	None	None	144.09
32	0.0001	0.1	144.14
32	1e-05	None	144.22
32	0.001	0.1	144.50

Best Reconstruction Example (R. Trick = 32, Regularization = 1e-05, Dropout = 0.1)

Best F1-Scores

r. trick	reg.	dropout	loss	precision	recall	f1-score
32	1e-05	0.5	181.39	0.983	0.993	0.988
16	1e-05	0.7	190.79	0.983	0.991	0.987
2	0.1	0.7	234.33	0.979	0.990	0.984
2	0.001	0.5	213.25	0.980	0.988	0.984
64	0.0001	0.6	194.82	0.980	0.988	0.984

Best OneClass-SVM

gamma	Precision	recall	f1-score
0.1	0.9664	0.8988	0.9313

Remarks

Because of the random components in the process the result should be cross-validated.
The model can represent a standard autoencoder, I don't have tested it.
I trained the models in Google Colab and saved the results in Google Drive for speed up the process.

Tools

Python 3.6
numpy 1.14.5
Tensorflow 1.8.0
Keras 2.1.6
matplotlib 2.2.2
sklearn 0.19.1

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
helper		helper
imgs		imgs
saved		saved
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
experiment.py		experiment.py
indices.npy		indices.npy
results.npy		results.npy

License

LordAlucard90/Variational-AutoEncoder-For-Novelty-Detection

Folders and files

Latest commit

History

Repository files navigation

Variational-AutoEncoder For Novelty Detection

Abstract

Dataset

Experiment Details

Model

Results

Reconstruction error by changing R. Trick size

Reconstruction error by changing Regularization Values (R. Trick = 32)

Reconstruction error by changing Dropout Values (R. Trick = 32, Regularization = 0.001)

Best Reconstruction Losses

Best Reconstruction Example (R. Trick = 32, Regularization = 1e-05, Dropout = 0.1)

Best F1-Scores

Best OneClass-SVM

Remarks

Contents

Tools

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages