Automatic Image Captioning

1. Project Overview

This project focuses on Computer Vision methods to generate Image Captions.

It uses transfer learning on a "Convolutional Neural Network (CNN)" and uses it as encoder for LSTM based RNN
CNN instead of generating class scrores for image, has been modified to generate only feature maps by removing linear layer used in prediction.
A LSTM based RNN is used as decoder which then takes captions from training data and feature map from CNN to train in generating captions.
Automatic Image Captioning using Encoder CNN & Decoder RNN.ipynb Notebook contains all code for training and prediction.
Node.js is used to create API for serving model for Web Apps. A Single page web app is located in Node.js_Server
Application takes images and shows proper caption for it.

Example

Output example with Web Application which uses a model which was trained for only 2 epochs

"a laptop computer sitting on top of a desk"	"a plate of food with a fork and fork"

2. About "Original Training Notebooks"

Detailed data visualization, training and inference notebooks are as follows:-

Notebook 1 : Testing COCO dataset API and running a visualization on sample images

Notebook 2 : Implementing and testing data loaders and tokenizers

Notebook 3 : Training Encoder-Decoder model

Notebook 4 : Testing trained model on test dataset and input images

To run the notebooks properly move them to project's root directory.

Setup

Install required python packages from requirements file using:-

pip install -r requirements.txt

Data & Checkpoints

Training, Testing and Validation datasets are over 24 GB, hence are needed to be downloaded from source. (https://cocodataset.org/#download)

 a. Training Dataset: http://images.cocodataset.org/zips/train2014.zip

 b. Testing Dataset: http://images.cocodataset.org/zips/test2014.zip

 c. Validation Dataset: http://images.cocodataset.org/zips/val2014.zip

 d. Annotations:
                 http://images.cocodataset.org/annotations/image_info_test2014.zip
                 http://images.cocodataset.org/annotations/annotations_trainval2014.zip

Trained Model's checkpoint (only upto 2 epochs) is located in model_checkpoints as well as Node.js_Server/python_models/saved_models

To setup download datasets and annotations and extract everything in Data directory.

Training Hardware

With batch size = 32 and model as per in training notebooks:-

It takes 2.30 Hrs (Approx) to run a single epoch on following hardware resources available on Google Colab

 Intel(R) Xeon(R) CPU @ 2.20GHz [Core(s) per socket:  1 | Thread(s) per core:  2 ]
 Tesla T4 [CUDA Version: 10.1]

It takes 8 Hrs (Approx. as per back calculation basd on time taken for 100 steps) to run single epoch on following local hardware:-

    Intel(R) Core(TM) i3-2120 CPU @ 3.20GHz [Core(s) :  2 | Thread(s) per core:  2 ]
    GTX 1060 [CUDA Version: 10.1]

3. Running Node.js Application

Implemented Node.js Application works by creating a python child process for generating captions for images.

Navigate to Node.js_Server
Place the trained pytorch model's checkpoint file checkpoint.pth to python_models/saved_models folder (in case new model was trained).

In case new vocab.pkl file was genereated, or new model's checkpoint filename is different, change following variables in Node.js_Server/python_models/model.py to appropriate names as per requirement.

ENCODER_CNN_CHECKPOINT = "python_models/saved_models/encoderEpoch_2.pth"
DECODER_LSTM_RNN_CHECKPOINT = "python_models/saved_models/decoderEpoch_2.pth"
VOCAB_FILE = "python_models/saved_models/vocab.pkl"

Run following commands

npm install                      (This installs Node.js dependencies)
pip install -r requirements.txt  (If python packages haven't been already installed from project root)
npm start

To test it on other devices on local network

node app.js <your_ip>:8000

Example

node app.js 192.168.32.134:8000

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Node.js_Server		Node.js_Server
Original Training Notebooks		Original Training Notebooks
cocoapi		cocoapi
images		images
model_checkpoints		model_checkpoints
.gitattributes		.gitattributes
.gitignore		.gitignore
Automatic Image Captioning using Encoder CNN & Decoder RNN.ipynb		Automatic Image Captioning using Encoder CNN & Decoder RNN.ipynb
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

License

3ZadeSSG/Automatic-Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Automatic Image Captioning

1. Project Overview

Example

Output example with Web Application which uses a model which was trained for only 2 epochs

2. About "Original Training Notebooks"

Setup

Data & Checkpoints

Training Hardware

3. Running Node.js Application

About

Topics

Resources

License

Stars

Watchers

Forks

Languages