GitHub - arjung128/image-paragraph-captioning: PyTorch implementation of 'Text Embedding Bank Module for Detailed Image Paragraph Captioning'

Text Embedding Bank Module for Detailed Image Paragraph Captioning

This repository includes a PyTorch implementation of Text Embedding Bank Module for Detailed Image Paragraph Captioning. Our code is based on Ruotian Luo's implementation of Self-critical Sequence Training for Image Captioning, available here., and Luke Melas-Kyriazi's implementation of Training for Diversity in Image Paragraph Captioning, available here.

Requirements

Python 2.7 (because coco-caption does not support Python 3)
PyTorch 0.4 (with torchvision)
cider (already included as a submodule)
coco-caption (already included as a submodule)

If training from scratch, you also need:

spacy (to tokenize words)
h5py (to store features)
scikit-image (to process images)

To clone this repository with submodules, use:

git clone --recurse-submodules https://github.com/lukemelas/image-paragraph-captioning

Train your own network

Download and preprocess cations

Download cider
- Clone this repo.
Download coco-caption
- Clone this repo.
Download captions:
- Run download.sh in data/captions
Preprocess captions for training (part 1):
- Download spacy English tokenizer with python -m spacy download en
- First, convert the text into tokens: cd scripts && python prepro_text.py
- Next, preprocess the tokens into a vocabulary (and map infrequent words to an UNK token) with the following command. Note that image/vocab information is stored in data/paratalk.json and caption data is stored in data/paratalk\_label.h5

python scripts/prepro_labels.py --input_json data/captions/para_karpathy_format.json --output_json data/paratalk.json --output_h5 data/paratalk

Preprocess captions into a coco-captions format for calculating CIDER/BLEU/etc:
- Run scripts/prepro\_captions.py
- There should be 14,575/2487/2489 images and annotations in the train/val/test splits
- Uncomment line 44 ((Spice(), "SPICE")) in coco-caption/pycocoevalcap/eval.py to disable Spice testing
Preprocess ngrams for self-critical training:

python scripts/prepro_ngrams.py --input_json data/captions/para_karpathy_format.json --dict_json data/paratalk.json --output_pkl data/para_train --split train

Extract image features using an object detector
- We make pre-processed features widely available:
  - Download and extract parabu_fc and parabu_att from here into data/bu_data
- Or generate the features yourself:
  - Download the Visual Genome Dataset
  - Apply the bottom-up attention object detector here made by Peter Anderson.
  - Use scripts/make_bu_data.py to convert the image features to .npz files for faster data loading

Train the network

As explained in Self-Critical Sequence Training, training occurs in two steps:

The model is trained with a cross-entropy loss (~30 epochs)
The model is trained with a self-critical loss (30+ epochs)

Training hyperparameters may be accessed with python train.py --help.

A reasonable set of hyperparameters is provided in train_xe.sh (for cross-entropy) and train_sc.sh (for self-critical).

mkdir log_xe
./train_xe_vec.sh   # for baseline model without the TEB module, do: ./train_xe.sh

You can then copy the model:

./scripts/copy_model.sh xe sc

And train with self-critical:

mkdir log_sc
./train_sc_vec.sh   # for baseline model without the TEB module, do: ./train_sc.sh

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
misc		misc
models		models
scripts		scripts
README.md		README.md
dataloader.py		dataloader.py
dataloader.pyc		dataloader.pyc
dataloaderraw.py		dataloaderraw.py
eval.py		eval.py
eval_utils.py		eval_utils.py
eval_utils.pyc		eval_utils.pyc
opts.py		opts.py
opts.pyc		opts.pyc
paragraphs_image_ids.txt		paragraphs_image_ids.txt
paragraphs_vectors.txt.zip		paragraphs_vectors.txt.zip
train.py		train.py
train_sc.sh		train_sc.sh
train_sc_vec.sh		train_sc_vec.sh
train_xe.sh		train_xe.sh
train_xe_vec.sh		train_xe_vec.sh

arjung128/image-paragraph-captioning

Folders and files

Latest commit

History

Repository files navigation

Text Embedding Bank Module for Detailed Image Paragraph Captioning

Requirements

Train your own network

Download and preprocess cations

Train the network

About

Resources

Stars

Watchers

Forks

Languages