NIPS 2018 Adversarial Vision Challenge "Robust Model Track"

Timo I. Denk, Florian Pfisterer, Samed Guener
(published in November 2018)

Abstract

This repository contains code, documents, and deployment configuration files, related to our participation in the 2018 NIPS Adversarial Vision Challenge "Robust Model Track".
We implemented a technique called a LESCI-layer which is based on vector quantization (VQ) and supposed to increase the robustness of a neural network classifier. It compresses the representation at a certain layer with a matrix computed using PCA on representations induced by correctly classified training samples at this layer. The compressed vector is being compared to an embedding space; and replaced with an embedding vector if a certain percentage of the k most similar vectors belong to the same output label.
In the current configuration, our method did not increase the robustness of the ResNet-based classifier for Tiny ImageNet, as measured by the challenge, presumably because it comes with a decrease in classification accuracy. We have documented our approach formally in this PDF.

Background

The annual NIPS conference has a competition track. We have participated in the Adversarial Vision Challenge "Robust Model Track":

The overall goal of this challenge is to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks. As of right now, modern machine vision algorithms are extremely susceptible to small and almost imperceptible perturbations of their inputs (so-called adversarial examples). This property reveals an astonishing difference in the information processing of humans and machines and raises security concerns for many deployed machine vision systems like autonomous cars. Improving the robustness of vision algorithms is thus important to close the gap between human and machine perception and to enable safety-critical applications.

Team

We are three CS students from Germany who worked on the NIPS project in their leisure time.

These were our responsibilities:

Timo Denk (left; @simsso): Team lead, architectural decisions, Python development, ML research and ideas.
Florian Pfisterer (middle; @florianpfisterer): Architectural decisions, Python development, ML research and ideas.
Samed Güner (right; @doktorgibson): Training pipeline design, cloud administration, pipeline implementation.

Repository

This repository is an integral component of our work and served the following purposes:

Code. The repository contains the entire commit history of our code. During development it has proven to be an effective way of catching up with the commits of other team members.
Project management tool. We used issues quite extensively to keep track of work items and meetings. For each meeting we took notes of assignments and documented the progress we had made.
Knowledge base. The repository's wiki contains enduring ideas and documentation, such as how our pipeline is set up, which papers we consider relevant, or how we name our commits, just to name a few.
Review. Every contribution to the master branch had to be reviewed. In total we opened more than 25 pull requests; some of which received more than 30 comments.
DevOps. We set up webhooks to the Google Cloud Platform to be able to automatically spin up new instances for training, once a commit was flagged with a certain tag.

Codebase

Our codebase consists of two Python modules, namely nips_defense and vq_layer. In addition to that we publish an experiments folder which contains dirty code that was written for the sake of testing ideas. This section mentions some specifics and references the actual documentation. The class diagrams were generated with pyreverse. TODO(florianpfisterer): update nips_defense to match the new name and update links accordingly.

VQ-Layer

The vq_layer module contains TensorFlow (TF) implementations of our vector quantization ideas. Following the TF API, that is a number of functions which work with tf.Tensor objects. The features as well as install instructions can be found in the README file of the module.

We prioritized a good test coverage to ensure the proper functioning of the module. Each of the test classes covers one specific aspect (described in a comment) of the module. The test classes share some functionality, e.g. graph reset, session creation, and random seed, which we have placed in the TFTestCase class.

Fig.: Class diagram of the module vq_layer. It shows the test classes which inherit from TFTestCase. Each class is responsible for testing a specific aspect for which it implements a plurality of test cases (methods).

NIPS Defense

The nips_defense module contains our approaches to developing a more robust classifier for the Tiny ImageNet dataset. The documentation can be found in the README file.

Our basic idea was to be able to try out new things by inheriting from some Model class and overriding its graph construction method. The new method would then contain some special features that we want to test. This idea is reflected in the class diagram below.

The BaseModel contains fundamental methods and attributes that all our ML models need. For instance an epoch counter or functionality for saving and restoring weights. The two inheriting classes are two ResNet implementations that can restore pre-trained weights.

BaselineResNet is designed to work with baseline weights provided by the challenge organizers, while ResNet works with "ALP-trained ResNet-v2-50" weights.

The classes inheriting from BaselineResNet and ResNet are our experiments: BaselineLESCIResNet, LESCIResNet, PCAResNet, ParallelVQResNet, VQResNet, and ActivationsResNet. They are typically using the functions provided by the vq_layer module.

Our input pipeline provides the models with images from the Tiny ImageNet dataset. It follows the official recommendation by using TF's tf.data API. The code is split into more generic functions, which might be reused in pipelines for other datasets (BasePipeline class), and the code specific to the Tiny ImageNet dataset (TinyImageNetPipeline class), for instance reading label text files or image augmentation.

Our logging is quite comprehensive. Because we accumulate gradients over several physical batches, we cannot use the plain tf.summary API and have to accumulate scalars and histograms in order to create a tf.Summary object manually. This functionality is placed in Logger, Accumulator, and inheriting classes.

Fig.: Class diagram of the module nips_defense. Accumulators are on the left, the different models are in the middle, the pipeline and misc. is on the right.

Experiments

Our experiments are a collection of Python scripts, MATLAB files, and Jupyter notebooks. Some highlights are:

A visualization of the embedding space training: experiments/vq-layer/003-embedding-space-training.ipynb
Our implementation of the fast gradient sign method (FGSM): experiments/mnist/src/attack.py.

Fig.: Gradient (right) of loss wrt. input for a sample (left) from the MNIST dataset.

Training Pipeline

Our DevOps unit (Samed) has set up a training pipeline that simplifies the empirical evaluation of ML ideas. For the researcher, triggering a training run is a simple as tagging a commit and pushing it. The tag triggers a pipeline which creates a new virtual machine (VM) on the Google Cloud Platform (GCP). The VM is configured to have a GPU and to run the training job (Python files). The results (e.g. model weights and logged metrics) were streamed to a persistent storage which the ML researcher could access through the GCP user interface and a TensorBoard instance which we kept running.

More details about the pipeline can be found here and an analysis of GCP's capabilities (from our perspective) is written down here.

Results

Our final submission was intended to be a pre-trained ResNet (baseline supplied by the challenge) which uses a LESCI-layer at a level in the network that gives a good balance between robustness and accuracy (more about our reasoning behind this in our PDF article).

Computing the PCA on the activations from early layers turned out to be computationally infeasible in terms of memory requirement, which is why we had to constrain our hyperparameter search to only one position late in the network where the dimension of an activation was only 512 (instead of 131,072 up to 262,144 in higher layers).

Unfortunately, this hyperparameter grid search gave us no combination of parameters that resulted in an accuracy of more than 50.0% and a good percentage of inputs that were projected at the same time (calculated based on the Tiny ImageNet validation set).

Future work should focus on inserting a LESCI-layer at an earlier layer in the network, which we were not able to do for a lack of computational resources.

Name		Name	Last commit message	Last commit date
Latest commit History 665 Commits
deployment		deployment
docs/article		docs/article
experiments		experiments
nips-defense		nips-defense
vq-layer		vq-layer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
apt.txt		apt.txt
crowdai.json		crowdai.json
run.sh		run.sh
submit.md		submit.md

License

Simsso/NIPS-2018-Adversarial-Vision-Challenge

Folders and files

Latest commit

History

Repository files navigation

NIPS 2018 Adversarial Vision Challenge "Robust Model Track"

Abstract

Background

Team

Repository

Codebase

VQ-Layer

NIPS Defense

Experiments

Training Pipeline

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages