Densecap-tensorflow

Implementation of CVPR2017 paper: A Hierarchical Approach for Generating Descriptive Image Paragraphs by ** Jonathan Krause, Justin Johnson, Ranjay Krishna, Fei-Fei Li**

NOTE: This repo is based on densecap-tensorflow, and it's still buggy.

Note

Update 2018.1.27

Following procedures will be adapted for IM2P soon.

Dependencies

To install required python modules by:

pip install -r lib/requirements.txt

Preparing data

Download

Website of Visual Genome Dataset

Make a new directory VG wherever you like.
Download images Part1 and Part2, extract all (two parts) to directory VG/images
Download image meta data, extract to directory VG/1.2 or VG/1.0 according to the version you download.
Download region descriptions, extract to directory VG/1.2 or VG/1.0 accordingly.
For the following process, we will refer directory VG as raw_data_path

Unlimit RAM

If one has RAM more than 16G, then you can preprocessing dataset with following command.

$ cd $ROOT/lib
$ python preprocess.py --version [version] --path [raw_data_path] \
        --output_dir [dir] --max_words [max_len]

Limit RAM (Less than 16G)

If one has RAM less than 16G.

Firstly, setting up the data path in info/read_regions.py accordingly, and run the script with python. Then it will dump regions in REGION_JSON directory. It will take time to process more than 100k images, so be patient.

$ cd $ROOT/info
$ python read_regions --version [version] --vg_path [raw_data_path]

In lib/preprocess.py, set up data path accordingly. After running the file, it will dump gt_regions of every image respectively to OUTPUT_DIR as directory.

$ cd $ROOT/lib
$ python preprocess.py --version [version] --path [raw_data_path] \
        --output_dir [dir] --max_words [max_len] --limit_ram

Compile local libs

$ cd root/lib
$ make

Train

Add or modify configurations in root/scripts/dense_cap_config.yml, refer to 'lib/config.py' for more configuration details.

$ cd $ROOT
$ bash scripts/dense_cap_train.sh [dataset] [net] [ckpt_to_init] [data_dir] [step]

Parameters:

dataset: visual_genome_1.2 or visual_genome_1.0.
net: res50, res101
ckpt_to_init: pretrained model to be initialized with. Refer to tf_faster_rcnn for more init weight details.
data_dir: the data directory where you save the outputs after prepare data.
step: for continue training.
- step 1: fix convnet weights
- stpe 2: finetune convnets weights
- step 3: add context fusion, but fix convnets weights
- step 4: finetune the whole model.

Demo

Create a directory data/demo

$ mkdir $ROOT/data/demo

Then put the images to be tested in the directory and run

$ cd $ROOT
$ bash scripts/dense_cap_demo.sh [ckpt_path] [vocab_path]

It will create html files in $ROOT/demo, just click it. Or you can use the web-based visualizer created by karpathy by

$ cd $ROOT/vis
$ python -m SimpleHTTPServer 8181

Then point your web brower to http://localhost:8181/view_results.html.

TODO:

Debugging.

References

The Faster-RCNN framework inherited from repo tf-faster-rcnn by endernewton
The official repo of densecap
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Official tensorflow models - "im2text".
Adapted web-based visualizer from jcjohnson's densecap repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Densecap-tensorflow

Note

Dependencies

Preparing data

Download

Unlimit RAM

Limit RAM (Less than 16G)

Compile local libs

Train

Demo

TODO:

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Densecap-tensorflow

Note

Dependencies

Preparing data

Download

Unlimit RAM

Limit RAM (Less than 16G)

Compile local libs

Train

Demo

TODO:

References