bottom-up-attention.pytorch

主要改了生成npz的格式,其他没怎么动

This repository contains a PyTorch reimplementation of the bottom-up-attention project based on Caffe.

We use Detectron2 as the backend to provide completed functions including training, testing and feature extraction. Furthermore, we migrate the pre-trained Caffe-based model from the original repository which can extract the same visual features as the original model (with deviation < 0.01).

Some example object and attribute predictions for salient image regions are illustrated below. The script to obtain the following visualizations can be found here

Prerequisites

Requirements

Python >= 3.6
PyTorch >= 1.4
Cuda >= 9.2 and cuDNN
Apex
Detectron2

Note that most of the requirements above are needed for Detectron2.

Installation

Install Detectron2 according to their official instructions here.

Compile other used tools using the following script:

# clone the repository
$ git clone --recursive https://github.com/MILVLG/bottom-up-attention.pytorch
# install apex
$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ python setup.py install
$ cd ..
# install the rest modules
$ python setup.py build develop

Note that using the latest version of Detectron2 may result in a running error. Please use the recommended version in this repository.

Setup

If you want to train or test the model, you need to download the images and annotation files of the Visual Genome (VG) dataset. If you only need to extract visual features using the pre-trained model, you can skip this part.

The original VG images (part1 and part2) are to be downloaded and unzipped to the datasets folder.

The generated annotation files in the original repository are needed to be transformed to a COCO data format required by Detectron2. The preprocessed annotation files can be downloaded here and unzipped to the dataset folder.

Finally, the datasets folders will have the following structure:

|-- datasets
   |-- vg
   |  |-- image
   |  |  |-- VG_100K
   |  |  |  |-- 2.jpg
   |  |  |  |-- ...
   |  |  |-- VG_100K_2
   |  |  |  |-- 1.jpg
   |  |  |  |-- ...
   |  |-- annotations
   |  |  |-- train.json
   |  |  |-- val.json

Training

The following script will train a bottom-up-attention model on the train split of VG. We are still working on this part to reproduce the same results as the Caffe version.

$ python3 train_net.py --mode detectron2 \
         --config-file configs/bua-caffe/train-bua-caffe-r101.yaml \ 
         --resume

mode = {'caffe', 'detectron2'} refers to the used mode. We only support the mode with Detectron2, which refers to detectron2 mode, since we think it is unnecessary to train a new model using the caffe mode.
config-file refers to all the configurations of the model.
resume refers to a flag if you want to resume training from a specific checkpoint.

Testing

Given the trained model, the following script will test the performance on the val split of VG:

$ python3 train_net.py --mode caffe \
         --config-file configs/bua-caffe/test-bua-caffe-r101.yaml \ 
         --eval-only --resume

mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode.
config-file refers to all the configurations of the model, which also include the path of the model weights.
eval-only refers to a flag to declare the testing phase.
resume refers to a flag to declare using the pre-trained model.

Feature Extraction

Similar with the testing stage, the following script will extract the bottom-up-attention visual features with provided hyper-parameters:

$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ 
         --image-dir <image_dir> --gt-bbox-dir <out_dir> --out-dir <out_dir>  --resume

mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode.
config-file refers to all the configurations of the model, which also include the path of the model weights.
image-dir refers to the input image directory.
gt-bbox-dir refers to the ground truth bbox directory.
out-dir refers to the output feature directory.
resume refers to a flag to declare using the pre-trained model.

Moreover, using the same pre-trained model, we provide a two-stage strategy for extracting visual features, which results in (slightly) more accurate visual features:

# extract bboxes only:
$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101-bbox-only.yaml \ 
         --image-dir <image_dir> --out-dir <out_dir>  --resume

# extract visual features with the pre-extracted bboxes:
$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101-gt-bbox.yaml \ 
         --image-dir <image_dir> --gt-bbox-dir <bbox_dir> --out-dir <out_dir>  --resume

Pre-trained models

We provided pre-trained models here. The evaluation metrics are exactly the same as those in the original Caffe project. More models will be continuously updated.

Model	Mode	Backbone	Objects mAP@0.5	Objects weighted mAP@0.5	Download
Faster R-CNN	Caffe, K=36	ResNet-101	9.3%	14.0%	model
Faster R-CNN	Caffe, K=[10,100]	ResNet-101	10.2%	15.1%	model
Faster R-CNN	Caffe, K=[10,100]	ResNet-152	11.1%	15.7%	model

License

This project is released under the Apache 2.0 license.

Contact

This repo is currently maintained by Jing Li (@J1mL3e_) and Zhou Yu (@yuzcccc).

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
configs/bua-caffe		configs/bua-caffe
data		data
dataloader		dataloader
datasets		datasets
detectron2 @ 5e2a6f6		detectron2 @ 5e2a6f6
evaluation		evaluation
models		models
output		output
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
extract_features.py		extract_features.py
opts.py		opts.py
setup.py		setup.py
train_net.py		train_net.py

License

EricWWWW/bottom-up-attention.pytorch

Folders and files

Latest commit

History

Repository files navigation

bottom-up-attention.pytorch

Table of Contents

Prerequisites

Requirements

Installation

Setup

Training

Testing

Feature Extraction

Pre-trained models

License

Contact

About

Resources

License

Stars

Watchers

Forks

Languages