Skip to content

This project implements an Automatic Image Captioning model based on encoder-decoder CNN-RNN architecture trained on COCO dataset. Model is able to generate statements about input image.

License

Notifications You must be signed in to change notification settings

3ZadeSSG/Automatic-Image-Captioning

Repository files navigation

License: MIT

Automatic Image Captioning

1. Project Overview

This project focuses on Computer Vision methods to generate Image Captions.

  • It uses transfer learning on a "Convolutional Neural Network (CNN)" and uses it as encoder for LSTM based RNN
  • CNN instead of generating class scrores for image, has been modified to generate only feature maps by removing linear layer used in prediction.
  • A LSTM based RNN is used as decoder which then takes captions from training data and feature map from CNN to train in generating captions.
  • Automatic Image Captioning using Encoder CNN & Decoder RNN.ipynb Notebook contains all code for training and prediction.
  • Node.js is used to create API for serving model for Web Apps. A Single page web app is located in Node.js_Server
  • Application takes images and shows proper caption for it.

Example

Output example with Web Application which uses a model which was trained for only 2 epochs
"a laptop computer sitting on top of a desk" "a plate of food with a fork and fork"

2. About "Original Training Notebooks"

Detailed data visualization, training and inference notebooks are as follows:-

Notebook 1 : Testing COCO dataset API and running a visualization on sample images

Notebook 2 : Implementing and testing data loaders and tokenizers

Notebook 3 : Training Encoder-Decoder model

Notebook 4 : Testing trained model on test dataset and input images

To run the notebooks properly move them to project's root directory.

Setup

Install required python packages from requirements file using:-

pip install -r requirements.txt

Data & Checkpoints

  1. Training, Testing and Validation datasets are over 24 GB, hence are needed to be downloaded from source. (https://cocodataset.org/#download)

     a. Training Dataset: http://images.cocodataset.org/zips/train2014.zip
    
     b. Testing Dataset: http://images.cocodataset.org/zips/test2014.zip
    
     c. Validation Dataset: http://images.cocodataset.org/zips/val2014.zip
    
     d. Annotations:
                     http://images.cocodataset.org/annotations/image_info_test2014.zip
                     http://images.cocodataset.org/annotations/annotations_trainval2014.zip
    
  2. Trained Model's checkpoint (only upto 2 epochs) is located in model_checkpoints as well as Node.js_Server/python_models/saved_models

To setup download datasets and annotations and extract everything in Data directory.

Training Hardware

With batch size = 32 and model as per in training notebooks:-

  1. It takes 2.30 Hrs (Approx) to run a single epoch on following hardware resources available on Google Colab

     Intel(R) Xeon(R) CPU @ 2.20GHz [Core(s) per socket:  1 | Thread(s) per core:  2 ]
     Tesla T4 [CUDA Version: 10.1]
    

It takes 8 Hrs (Approx. as per back calculation basd on time taken for 100 steps) to run single epoch on following local hardware:-

    Intel(R) Core(TM) i3-2120 CPU @ 3.20GHz [Core(s) :  2 | Thread(s) per core:  2 ]
    GTX 1060 [CUDA Version: 10.1]

3. Running Node.js Application

Implemented Node.js Application works by creating a python child process for generating captions for images.

  1. Navigate to Node.js_Server

  2. Place the trained pytorch model's checkpoint file checkpoint.pth to python_models/saved_models folder (in case new model was trained).

  3. In case new vocab.pkl file was genereated, or new model's checkpoint filename is different, change following variables in Node.js_Server/python_models/model.py to appropriate names as per requirement.

    ENCODER_CNN_CHECKPOINT = "python_models/saved_models/encoderEpoch_2.pth"
    DECODER_LSTM_RNN_CHECKPOINT = "python_models/saved_models/decoderEpoch_2.pth"
    VOCAB_FILE = "python_models/saved_models/vocab.pkl"
    
  4. Run following commands

    npm install                      (This installs Node.js dependencies)
    pip install -r requirements.txt  (If python packages haven't been already installed from project root)
    npm start
    
  5. To test it on other devices on local network

    node app.js <your_ip>:8000
    

    Example

    node app.js 192.168.32.134:8000
    

About

This project implements an Automatic Image Captioning model based on encoder-decoder CNN-RNN architecture trained on COCO dataset. Model is able to generate statements about input image.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published