Skip to content

Whileherham/IMR-HSNet

Repository files navigation

Iterative Few-shot Semantic Segmentation from Image Label Text

This is the implementation of the paper "Iterative Few-shot Semantic Segmentation from Image Label Text" (IJCAI 2022).
The codes are implemented based on HSNet(https://github.com/juhongm999/hsnet), CLIP(https://github.com/openai/CLIP), and https://github.com/jacobgil/pytorch-grad-cam. Thanks for their great work!

Requirements

Following HSNet:

  • Python 3.7
  • PyTorch 1.5.1
  • cuda 10.1
  • tensorboard 1.14

Conda environment settings:

conda create -n hsnet python=3.7
conda activate hsnet

conda install pytorch=1.5.1 torchvision cudatoolkit=10.1 -c pytorch
conda install -c conda-forge tensorflow
pip install tensorboardX

Preparing Few-Shot Segmentation Datasets

Download following datasets:

1. PASCAL-5i

Download PASCAL VOC2012 devkit (train/val data):

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

Download PASCAL VOC2012 SDS extended mask annotations from HSNet [Google Drive].

2. COCO-20i

Download COCO2014 train/val images and annotations:

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip

Download COCO2014 train/val annotations from HSNet Google Drive: [train2014.zip], [val2014.zip]. (and locate both train2014/ and val2014/ under annotations/ directory).

Create a directory '../Datasets_HSN' for the above three few-shot segmentation datasets and appropriately place each dataset to have following directory structure:

../                         # parent directory
├── ./                      # current (project) directory
│   ├── common/             # (dir.) helper functions
│   ├── data/               # (dir.) dataloaders and splits for each FSSS dataset
│   ├── model/              # (dir.) implementation of Hypercorrelation Squeeze Network model 
│   ├── README.md           # intstruction for reproduction
│   ├── train.py            # code for training HSNet
│   └── test.py             # code for testing HSNet
└── Datasets_HSN/
    ├── VOC2012/            # PASCAL VOC2012 devkit
    │   ├── Annotations/
    │   ├── ImageSets/
    │   ├── ...
    │   └── SegmentationClassAug/
    ├── COCO2014/           
    │   ├── annotations/
    │   │   ├── train2014/  # (dir.) training masks (from Google Drive) 
    │   │   ├── val2014/    # (dir.) validation masks (from Google Drive)
    │   │   └── ..some json files..
    │   ├── train2014/
    │   └── val2014/
    ├── CAM_VOC_Train/ 
    ├── CAM_VOC_Val/ 
    └── CAM_COCO/

Preparing CAM for Few-Shot Segmentation Datasets

1. PASCAL-5i

  • Generate Grad CAM for images
python generate_cam_voc.py --traincampath ../Datasets_HSN/CAM_VOC_Train/
                           --valcampath ../Datasets_HSN/CAM_VOC_Val/

2. COCO-20i

python generate_cam_coco.py --campath ../Datasets_HSN/CAM_COCO/

Training

1. PASCAL-5i

python train.py --backbone {vgg16, resnet50} 
                --fold {0, 1, 2, 3} 
                --benchmark pascal
                --lr 4e-4
                --bsz 40
                --stage 2
                --logpath "your_experiment_name"
                --traincampath ../Datasets_HSN/CAM_VOC_Train/
                --valcampath ../Datasets_HSN/CAM_VOC_Val/
  • Training takes approx. 1 days until convergence (trained with four V100 GPUs).

2. COCO-20i

python train.py --backbone {vgg16, resnet50}
                --fold {0, 1, 2, 3} 
                --benchmark coco 
                --lr 2e-4
                --bsz 20
                --stage 3
                --logpath "your_experiment_name"
                --traincampath ../Datasets_HSN/CAM_COCO/
                --valcampath ../Datasets_HSN/CAM_COCO/
  • Training takes approx. 1 week until convergence (trained four V100 GPUs).

Babysitting training:

Use tensorboard to babysit training progress:

  • For each experiment, a directory that logs training progress will be automatically generated under logs/ directory.
  • From terminal, run 'tensorboard --logdir logs/' to monitor the training progress.
  • Choose the best model when the validation (mIoU) curve starts to saturate.

Testing

1. PASCAL-5i

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone {vgg16, resnet50} 
               --fold {0, 1, 2, 3} 
               --benchmark pascal
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

2. COCO-20i

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone {vgg16, resnet50} 
               --fold {0, 1, 2, 3} 
               --benchmark coco 
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

BibTeX

If you use this code for your research, please consider citing:

@inproceedings{ijcai2022p193,
  title     = {Iterative Few-shot Semantic Segmentation from Image Label Text},
  author    = {Wang, Haohan and Liu, Liang and Zhang, Wuhao and Zhang, Jiangning and Gan, Zhenye and Wang, Yabiao and Wang, Chengjie and Wang, Haoqian},
  booktitle = {Proceedings of the Thirty-First International Joint Conference on
               Artificial Intelligence, {IJCAI-22}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Lud De Raedt},
  pages     = {1385--1392},
  year      = {2022},
  month     = {7},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2022/193},
  url       = {https://doi.org/10.24963/ijcai.2022/193},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages