dense image captioning

An unofficial Torch implementation of J. Lu, C. Xiong, et al., Knowing when to Look: Adaptive Attention via a Visual Sentinel for Image Captioning, 2017 trained on the COCO image captioning and Flickr30k datasets.

The implementation presents the following variations from the paper:

deformable adaptive attention;
larger visual sentinel size (128-dim);
model eval against the SPICE metric;
MCTS-based decoding.

Introduction

The role of image dense captioning is immense for enabling visual-language understanding of the outer world.

In this project we propose a deformable variant of the visual sentinel via adaptive attention introduced in the reference paper for estimating grounding probas which allows larger networks to be constructed while running at a faster inference speed and training for almost half the epochs with equal performance.

This project is part of a larger venture for the development of visual-language aid tools for visually-impaired people, by combining speech recognition, speech synthesis, image captioning and familiar person identification.

For more information, see the attached in-depth report.

Training

The model was trained for 50 epochs on a multi-GPU HPC cluster courtesy of CERN.

Usage

The following files must be downloaded from Google Drive:

The former contains the dataset with COCO-like annotations and the corresponding vocabulary.

The following files should be downloaded from Google Driver for display purposes:

N.B.: If the provided links are not longer available, contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
preprocessing		preprocessing
report		report
src		src
training		training
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks

notebooks

preprocessing

preprocessing

report

report

src

src

training

training

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

dense image captioning

Introduction

Training

Usage

Authors

About

Releases

Packages

Languages

License

DiTo97/dense-image-captioning

Folders and files

Latest commit

History

Repository files navigation

dense image captioning

Introduction

Training

Usage

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages