[Deprecated] Image Caption Generator

Notice: This project uses an older version of TensorFlow, and is no longer supported. Please consider using other latest alternatives.

A Neural Network based generative model for captioning images.

Checkout the android app made using this image-captioning-model: Cam2Caption and the associated paper.

Work in Progress

Updates(Jan 14, 2018):

Some Code Refactoring.
Added MSCOCO dataset support.

Updates(Mar 12, 2017):

Added Dropout Layer for LSTM, Xavier Glorot Initializer for Weights
Significant Optimizations for Caption Generation i.e Decode Routine, computation time reduce from 3 seconds to 0.2 seconds
Functionality to Freeze Graphs and Merge them.
Direct Serving(Dual Graph and Single Graph) Routines in /util/
Explored and chose the fastest and most efficient Image Preprocessing Method.
Ported code to TensorFlow r1.0

Updates(Feb 27, 2017):

Added BLEU evaluation metric and batch processing of images to produce batches of captions.

Updates(Feb 25, 2017):

Added optimizations and one-time pre-processing of Flickr30K data
Changed to a faster Image Preprocessing method using OpenCV

To-Do(Open for Contribution):

FIFO-queues in training
Attention-Model
Trained Models for Distribution.

Pre-Requisites:

Tensorflow r1.0
NLTK
pandas
Download Flickr30K OR MSCOCO images and captions.
Download Pre-Trained InceptionV4 Tensorflow graph from DeepDetect available here

Procedure to Train and Generate Captions:

Clone the Repository to preserve Directory Structure
For flickr30k put results_20130124.token and Flickr30K images in flickr30k-images folder OR For MSCOCO put captions_val2014.json and MSCOCO images in COCO-images folder .
Put inception_v4.pb in ConvNets folder
Generate features(features.npy) corresponding to the images in the dataset folder by running-
- For Flickr30K: python convfeatures.py --data_path Dataset/flickr30k-images --inception_path ConvNets/inception_v4.pb
- For MSCOCO: python convfeatures.py --data_path Dataset/COCO-images --inception_path ConvNets/inception_v4.pb
To Train the model run-
- For Flickr30K: python main.py --mode train --caption_path ./Dataset/results_20130124.token --feature_path ./Dataset/features.npy --resume
- For MSCOCO: python main.py --mode train --caption_path ./Dataset/captions_val2014.json --feature_path ./Dataset/features.npy --data_is_coco --resume
To Generate Captions for an Image run
- python main.py --mode test --image_path VALID_PATH
For usage as a python library see Demo.ipynb

(see python main.py -h for more)

Miscellaneous Notes:

Freezing the encoder and decoder Graphs

It's necessary to save both encoder and decoder graphs while running test. This is a one-time necessary run before freezing the encoder/decoder.
- python main.py --mode test --image_path ANY_TEST_IMAGE.jpg/png --saveencoder --savedecoder
In the project root directory use - python utils/save_graph.py --mode encoder --model_folder model/Encoder/ additionally you may want to use --read_file if you want to freeze the encoder for directly generating caption for an image file(path). Similarly, for decoder use - python utils/save_graph.py --mode decoder --model_folder model/Decoder/, read_file argument is not necessary for the decoder.
To use frozen encoder and decoder models as dual blackbox Serve-DualProtoBuf.ipynb. Note: You must freeze encoder graph with --read_file to run this notebook

(see python utils/save_graph.py -h for more)

Merging the encoder and decoder graphs for serving the model as a blackbox:

It's necessary to freeze the encoder and decoder as mentioned above.
In the project root directory run-
- python utils/merge_graphs.py --encpb ./model/Trained_Graphs/encoder_frozen_model.pb --decpb ./model/Trained_Graphs/decoder_frozen_model.pb additionally you may want to use --read_file if you want to freeze the encoder for directly generating caption for an image file(path).
To use merged encoder and decoder models as single frozen blackbox: Serve-SingleProtoBuf.ipynb. Note: You must freeze and merge encoder graph with --read_file to run this notebook

(see python utils/merge_graphs.py -h for more)

Training Steps vs Loss Graph in Tensorboard:

tensorboard --logdir model/log_dir
Navigate to localhost:6006

Citation:

If you use our model or code in your research, please cite the paper:

@article{Mathur2017,
  title={Camera2Caption: A Real-time Image Caption Generator},
  author={Pranay Mathur and Aman Gill and Aayush Yadav and Anurag Mishra and Nand Kumar Bansode},
  journal={IEEE Conference Publication},
  year={2017}
}

Reference:

Show and Tell: A Neural Image Caption Generator

-Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

License:

Protected Under BSD-3 Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
ConvNets		ConvNets
Dataset		Dataset
Images		Images
model		model
utils		utils
Demo.ipynb		Demo.ipynb
LICENSE		LICENSE
README.md		README.md
caption_generator.py		caption_generator.py
configuration.py		configuration.py
convfeatures.py		convfeatures.py
eval.py		eval.py
main.py		main.py
results.txt		results.txt

License

neural-nuts/image-caption-generator

Folders and files

Latest commit

History

Repository files navigation

[Deprecated] Image Caption Generator

Checkout the android app made using this image-captioning-model: Cam2Caption and the associated paper.

Work in Progress

Updates(Jan 14, 2018):

Updates(Mar 12, 2017):

Updates(Feb 27, 2017):

Updates(Feb 25, 2017):

To-Do(Open for Contribution):

Pre-Requisites:

Procedure to Train and Generate Captions:

Miscellaneous Notes:

Freezing the encoder and decoder Graphs

Merging the encoder and decoder graphs for serving the model as a blackbox:

Training Steps vs Loss Graph in Tensorboard:

Citation:

Reference:

License:

Some Examples:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages