CLSVAE for Systematic Error Repair

A semi-supervised VAE model for outlier detection and data repair of systematic errors in dirty datasets. Here we introduce the pytorch implementation of CLSVAE (Clean Subspace Variational Autoencoder).

This repo is the public release code for the pre-print "Repairing Systematic Outliers by Learning Clean Subspaces in VAEs". Link to the arXiv paper here: https://arxiv.org/abs/2207.08050 .

See paper for details on models, hyperparameters and datasets.

Please consider citing us if you use any part of our code.

Instalation

Requires Python 3.8. or higher
Pytorch framework (v1.8.1) was used
Used python packages can be found in ./src/requirements.txt
- e.g. you can install via pip install -r requirements.txt inside your "venv" or "conda" environment
Please install models package using inside your virtual environment (dev mode): pip install -e ./src/
- this package (name is repair_syserr_models) contains the code for the VAE models and associated
  utility functions
- five models provided (used in paper): VAE, CVAE, VAE_GMM, CCVAE, and CLSVAE.

Usage

Jupyter notebooks with examples for all models are found in ./src/notebooks/
- the current notebooks already have training run information in the cells, visualization of metrics and repairs, but can be re-run by the user.
Simple bash commands to run models can be found in ./src/repair_syserr_models/run_train_model.sh
An example exists (notebook, or in script) for each dataset from paper, for each model from paper, for 35% corruption level
Note --cuda-on flag for GPU training, remove for CPU only training

Inputs

Input data (dirty and clean datasets) for experiments to run models for notebooks and scripts in Usage.
Please see below to get data.

Data for Examples (Jupyter Notebooks and Scripts)

Copy folder contents from data in Google Drive (available here) to your local repo folder in ./data/
Three datasets (Fashion MNIST, Frey Faces, Synthetic Shapes) with 35% corruption level for each, both
ground-truth and corrupt data version therein, and several sizes of trusted set.

Outputs

The output results of the training run (e.g. metrics, performance and model parameters) are then found
in folder ./outputs/experiments_test/
The current folder already includes outputs from the existing example training runs.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data/examples_frey_faces/corrupt_level_35_percent/run_1		data/examples_frey_faces/corrupt_level_35_percent/run_1
outputs/experiments_test		outputs/experiments_test
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/examples_frey_faces/corrupt_level_35_percent/run_1

data/examples_frey_faces/corrupt_level_35_percent/run_1

outputs/experiments_test

outputs/experiments_test

src

src

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

CLSVAE for Systematic Error Repair

Instalation

Usage

Inputs

Data for Examples (Jupyter Notebooks and Scripts)

Outputs

License

About

Releases

Packages

Languages

License

sfme/clsvae-error-repair

Folders and files

Latest commit

History

Repository files navigation

CLSVAE for Systematic Error Repair

Instalation

Usage

Inputs

Data for Examples (Jupyter Notebooks and Scripts)

Outputs

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages