Momentum contrast in frequency & spatial domain
for fine-grained image classification

Description

Momentum contrast in Frequency and Spatial Domain (MocoFSD) inspired by the Moco [1] framework learns feature representation by combining the frequency and spatial domain information during the pre-training phase. Features learned by MocoFSD, outperform its self-supervised and supervised counterparts on two downstream tasks, fine-grained image classification, and image classification.

This project was part of the research internship done under Prof. Jiapan Guo at the University of Groningen for the course WMCS021-15.

Installation

1. Clone the repository:

$ git clone git@github.com:Rohit8y/MocoFSD.git
$ cd MocoFSD

2. Create a new Python environment and activate it:

$ python3 -m venv py_env
$ source py_env/bin/activate

3. Install necessary packages:

$ pip install -r requirements.txt

Data Preparation

Download the ImageNet dataset from http://www.image-net.org/.
Then, move and extract the training and validation images to labeled subfolders, using the following shell script
The following fine-tuning datasets will be downloaded using the PyTorch API automatically in the code.
- Stanford Dogs
- Stanford Cars
- FGVC Aircraft
- CIFAR 100
- DTD

Self-supervised Training

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

To do self-supervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python pretrain.py \
  -a resnet50 \
  --lr 0.03 \
  --batch-size 256 \
  --mlp --moco-t 0.2 --aug-plus --cos \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Note: for 4-gpu training, we recommend following the linear lr scaling recipe: --lr 0.015 --batch-size 128 with 4 gpus.

positional arguments:
  DIR                   path to dataset (default: imagenet)

optional arguments:
  --help                show this help message and exit
  --arch                model architecture: resnet18 | resnet34 | resnet50 (default: resnet18)
  --workers             number of data loading workers (default: 4)
  --epochs              number of total epochs to run
  --start-epoch N       manual epoch number (useful on restarts)
  --batch-size          mini-batch size (default: 256), this is the total batch size of all GPUs on the current node when using Data Parallel or Distributed Data Parallel
  --lr                  initial learning rate
  --momentum            momentum
  --weight-decay        weight decay (default: 1e-4)
  --resume              path to latest checkpoint (default: none)
  --evaluate            evaluate model on validation set
  --pretrained          use pre-trained model
  --world-size          number of nodes for distributed training
  --rank                node rank for distributed training
  --dist-url            url used to set up distributed training
  --dist-backend        distributed backend
  --seed                seed for initializing training.
  --gpu                 GPU id to use.
  --multiprocessing-distributed
                        Use multi-processing distributed training to launch N processes per node, which has N GPUs. This is the fastest way to use PyTorch for either single node or multi node data parallel
                        training

Fine-Tuning

Using the pre-trained model we have given the option to fine-tune on five downstream datasets: Stanford Cars, Stanford Dogs, CIFAR100, FGVC Aircraft, and DTD. To optimise these models for the downstream task, run:

python main.py -h

usage: main.py [-h] [--arch ARCH] [--epochs EPOCHS] [--lr LR] [--batch-size BS]
               [--wd WD][--dataset DATASET] [--model MODEL]
options:
  --help                show this help message and exit
  --arch                model architecture: resnet18 | resnet34 | resnet50 (default: resnet18)
  --epochs              number of total epochs to run (default: 100)
  --lr                  initial learning rate (default: 0.001)
  --batch-size          mini-batch size (default: 32)
  --wd                  weight decay (default: 1e-4)
  --dataset             fine-tuning dataset to usage: stanfordCars | stanfordDogs | aircraft | cifar100 | dtd (default: stanfordCars)
  --model               path to the pre-trained model

This implementation works on single-gpu and does not support multi-gpu training. We used a grid search to find the optimal value of other hyperparameters. Once the training process is completed, the final model will be saved by the name <dataset_name>_best_model.pth.tar

References

[1] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729-9738).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
loader		loader
models		models
opt		opt
tl		tl
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loader

loader

models

models

opt

opt

tl

tl

utils

utils

LICENSE

LICENSE

README.md

README.md

main.py

main.py

pretrain.py

pretrain.py

requirements.txt

requirements.txt

Repository files navigation

Momentum contrast in frequency & spatial domain
for fine-grained image classification

Contents

Description

Installation

Data Preparation

Self-supervised Training

Fine-Tuning

References

About

Releases

Packages

Languages

License

Rohit8y/MocoFSD

Folders and files

Latest commit

History

Repository files navigation

Momentum contrast in frequency & spatial domain for fine-grained image classification

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Momentum contrast in frequency & spatial domain
for fine-grained image classification