Skip to content

EricWWWW/bottom-up-attention.pytorch

 
 

Repository files navigation

bottom-up-attention.pytorch

主要改了生成npz的格式,其他没怎么动

This repository contains a PyTorch reimplementation of the bottom-up-attention project based on Caffe.

We use Detectron2 as the backend to provide completed functions including training, testing and feature extraction. Furthermore, we migrate the pre-trained Caffe-based model from the original repository which can extract the same visual features as the original model (with deviation < 0.01).

Some example object and attribute predictions for salient image regions are illustrated below. The script to obtain the following visualizations can be found here

example-image

Table of Contents

  1. Prerequisites
  2. Training
  3. Testing
  4. Feature Extraction
  5. Pre-trained models

Prerequisites

Requirements

Note that most of the requirements above are needed for Detectron2.

Installation

  1. Install Detectron2 according to their official instructions here.

  2. Compile other used tools using the following script:

    # clone the repository
    $ git clone --recursive https://github.com/MILVLG/bottom-up-attention.pytorch
    # install apex
    $ git clone https://github.com/NVIDIA/apex.git
    $ cd apex
    $ python setup.py install
    $ cd ..
    # install the rest modules
    $ python setup.py build develop

Note that using the latest version of Detectron2 may result in a running error. Please use the recommended version in this repository.

Setup

If you want to train or test the model, you need to download the images and annotation files of the Visual Genome (VG) dataset. If you only need to extract visual features using the pre-trained model, you can skip this part.

The original VG images (part1 and part2) are to be downloaded and unzipped to the datasets folder.

The generated annotation files in the original repository are needed to be transformed to a COCO data format required by Detectron2. The preprocessed annotation files can be downloaded here and unzipped to the dataset folder.

Finally, the datasets folders will have the following structure:

|-- datasets
   |-- vg
   |  |-- image
   |  |  |-- VG_100K
   |  |  |  |-- 2.jpg
   |  |  |  |-- ...
   |  |  |-- VG_100K_2
   |  |  |  |-- 1.jpg
   |  |  |  |-- ...
   |  |-- annotations
   |  |  |-- train.json
   |  |  |-- val.json

Training

The following script will train a bottom-up-attention model on the train split of VG. We are still working on this part to reproduce the same results as the Caffe version.

$ python3 train_net.py --mode detectron2 \
         --config-file configs/bua-caffe/train-bua-caffe-r101.yaml \ 
         --resume
  1. mode = {'caffe', 'detectron2'} refers to the used mode. We only support the mode with Detectron2, which refers to detectron2 mode, since we think it is unnecessary to train a new model using the caffe mode.

  2. config-file refers to all the configurations of the model.

  3. resume refers to a flag if you want to resume training from a specific checkpoint.

Testing

Given the trained model, the following script will test the performance on the val split of VG:

$ python3 train_net.py --mode caffe \
         --config-file configs/bua-caffe/test-bua-caffe-r101.yaml \ 
         --eval-only --resume
  1. mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode.

  2. config-file refers to all the configurations of the model, which also include the path of the model weights.

  3. eval-only refers to a flag to declare the testing phase.

  4. resume refers to a flag to declare using the pre-trained model.

Feature Extraction

Similar with the testing stage, the following script will extract the bottom-up-attention visual features with provided hyper-parameters:

$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ 
         --image-dir <image_dir> --gt-bbox-dir <out_dir> --out-dir <out_dir>  --resume
  1. mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode.

  2. config-file refers to all the configurations of the model, which also include the path of the model weights.

  3. image-dir refers to the input image directory.

  4. gt-bbox-dir refers to the ground truth bbox directory.

  5. out-dir refers to the output feature directory.

  6. resume refers to a flag to declare using the pre-trained model.

Moreover, using the same pre-trained model, we provide a two-stage strategy for extracting visual features, which results in (slightly) more accurate visual features:

# extract bboxes only:
$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101-bbox-only.yaml \ 
         --image-dir <image_dir> --out-dir <out_dir>  --resume

# extract visual features with the pre-extracted bboxes:
$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101-gt-bbox.yaml \ 
         --image-dir <image_dir> --gt-bbox-dir <bbox_dir> --out-dir <out_dir>  --resume

Pre-trained models

We provided pre-trained models here. The evaluation metrics are exactly the same as those in the original Caffe project. More models will be continuously updated.

Model Mode Backbone Objects mAP@0.5 Objects weighted mAP@0.5 Download
Faster R-CNN Caffe, K=36 ResNet-101 9.3% 14.0% model
Faster R-CNN Caffe, K=[10,100] ResNet-101 10.2% 15.1% model
Faster R-CNN Caffe, K=[10,100] ResNet-152 11.1% 15.7% model

License

This project is released under the Apache 2.0 license.

Contact

This repo is currently maintained by Jing Li (@J1mL3e_) and Zhou Yu (@yuzcccc).

About

An PyTorch reimplementation of bottom-up-attention models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 80.7%
  • Python 18.3%
  • Other 1.0%