The goal of this project (link to the original ArXiv paper):
-
Develop a systematic framework to measure concentration for arbitrary distributions
-
Theoretically, prove that the empirical concentration with respect to special collection of subsets will converge to the actual concentration asymptotically
-
Empirically, propose algorithms for measuring concentration of benchmark image distributions under both and distance metrics
The code was developed using Python3 on Anaconda
- Install Pytorch 0.4.1:
conda update -n base conda && conda install pytorch=0.4.1 torchvision -c pytorch -y
- Install dependencies:
pip install --upgrade pip && pip install scipy sklearn numpy torch setproctitle
-
Example for empirically measuring concentraion under metric:
- First, precompute the distance to the k-th nearest neighours for each training example
python preliminary.py --dataset mnist --metric infinity --k 50
- Next, run the proposed algorithm that finds a robust error region under
python main_infinity.py --dataset mnist --metric infinity --epsilon 0.3 --q 0.629 --clusters 10
- First, precompute the distance to the k-th nearest neighours for each training example
-
Example for empirically measuring concentraion under metric:
- First, precompute the nearest neighbor indices for each training example
python preliminary.py --dataset cifar --metric euclidean --alpha 0.05
- Next, run the proposed algorithm that finds a robust error region under
python main_euclidean.py --dataset cifar --metric euclidean --epsilon 0.2453 --alpha 0.05 --clusters 5
- First, precompute the nearest neighbor indices for each training example
load_data.py
: defines argparser and dataloaders for several benchmark image datasetspreliminary.py
: finds the k-nearest neighbors for each example in a given training datasetmain_infinity.py
: main function for emprically measuring concentration under metric based on complement of union of hyperrectanglesmain_euclidean.py
: main function for emprically measuring concentration under metric based on union of ballstune_infinity.py
: implements the tuning method (gird search for #clusters & binary search for q) for optimal concentration under metrictune_euclidean.py
: implements the tuning method (grid search for #clusters) for optimal concentration under metricbaseline.py
: implements the baseline method that heuristically estimates concentration using linear hyperplane proposed in Gilmer et al. (2018)