Skip to content

Rohit8y/MocoFSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Momentum contrast in frequency & spatial domain
for fine-grained image classification

  1. Description
  2. Installation
  3. Data Preparation
  4. Self-Supervised Training
  5. Fine-tuning
  6. References

Momentum contrast in Frequency and Spatial Domain (MocoFSD) inspired by the Moco [1] framework learns feature representation by combining the frequency and spatial domain information during the pre-training phase. Features learned by MocoFSD, outperform its self-supervised and supervised counterparts on two downstream tasks, fine-grained image classification, and image classification.

mocofsd_refined4_drawio

This project was part of the research internship done under Prof. Jiapan Guo at the University of Groningen for the course WMCS021-15.


1. Clone the repository:

$ git clone git@github.com:Rohit8y/MocoFSD.git
$ cd MocoFSD

2. Create a new Python environment and activate it:

$ python3 -m venv py_env
$ source py_env/bin/activate

3. Install necessary packages:

$ pip install -r requirements.txt

  • Download the ImageNet dataset from http://www.image-net.org/.
  • Then, move and extract the training and validation images to labeled subfolders, using the following shell script
  • The following fine-tuning datasets will be downloaded using the PyTorch API automatically in the code.
    • Stanford Dogs
    • Stanford Cars
    • FGVC Aircraft
    • CIFAR 100
    • DTD

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

To do self-supervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python pretrain.py \
  -a resnet50 \
  --lr 0.03 \
  --batch-size 256 \
  --mlp --moco-t 0.2 --aug-plus --cos \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Note: for 4-gpu training, we recommend following the linear lr scaling recipe: --lr 0.015 --batch-size 128 with 4 gpus.

positional arguments:
  DIR                   path to dataset (default: imagenet)

optional arguments:
  --help                show this help message and exit
  --arch                model architecture: resnet18 | resnet34 | resnet50 (default: resnet18)
  --workers             number of data loading workers (default: 4)
  --epochs              number of total epochs to run
  --start-epoch N       manual epoch number (useful on restarts)
  --batch-size          mini-batch size (default: 256), this is the total batch size of all GPUs on the current node when using Data Parallel or Distributed Data Parallel
  --lr                  initial learning rate
  --momentum            momentum
  --weight-decay        weight decay (default: 1e-4)
  --resume              path to latest checkpoint (default: none)
  --evaluate            evaluate model on validation set
  --pretrained          use pre-trained model
  --world-size          number of nodes for distributed training
  --rank                node rank for distributed training
  --dist-url            url used to set up distributed training
  --dist-backend        distributed backend
  --seed                seed for initializing training.
  --gpu                 GPU id to use.
  --multiprocessing-distributed
                        Use multi-processing distributed training to launch N processes per node, which has N GPUs. This is the fastest way to use PyTorch for either single node or multi node data parallel
                        training

Using the pre-trained model we have given the option to fine-tune on five downstream datasets: Stanford Cars, Stanford Dogs, CIFAR100, FGVC Aircraft, and DTD. To optimise these models for the downstream task, run:

python main.py -h

usage: main.py [-h] [--arch ARCH] [--epochs EPOCHS] [--lr LR] [--batch-size BS]
               [--wd WD][--dataset DATASET] [--model MODEL]
options:
  --help                show this help message and exit
  --arch                model architecture: resnet18 | resnet34 | resnet50 (default: resnet18)
  --epochs              number of total epochs to run (default: 100)
  --lr                  initial learning rate (default: 0.001)
  --batch-size          mini-batch size (default: 32)
  --wd                  weight decay (default: 1e-4)
  --dataset             fine-tuning dataset to usage: stanfordCars | stanfordDogs | aircraft | cifar100 | dtd (default: stanfordCars)
  --model               path to the pre-trained model

This implementation works on single-gpu and does not support multi-gpu training. We used a grid search to find the optimal value of other hyperparameters. Once the training process is completed, the final model will be saved by the name <dataset_name>_best_model.pth.tar


[1] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729-9738).


About

PyTorch implementation of Momentum contrast in Frequency & Spatial Domain (MocoFSD) for fine-grained image classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages