Skip to content

saket349/ImageCaptionGenerator

Repository files navigation

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM as well as Greedy Search.

Content ->

1. Requirements

Recommended System Requirements to train model.

  • A good CPU and a GPU with atleast 8GB memory
  • Atleast 8GB of RAM
  • Active internet connection

2. Installation

Required libraried -

  • Numpy - 1.16.4
  • Python - 3.6.7
  • Keras - 2.2.4
  • Tensorflow - 1.13.1
  • nltk - 3.2.5
  • PIL - 4.3.0
  • Matplotlib - 3.0.3
  • tqdm - 4.28.1

DataFile Required - Download from link

  • Flickr8k_Dataset: contain images
  • Flickr8k.token.txt: contain 5 caption for each token or imageID
  • Flickr8k.trainImages.txt: contain imageId of train images
  • Flickr8k.testImages.txt: contain imageId of test images

3. Generated Captions on Test Images

Model used - InceptionV3 + LSTM

Image Caption
Image 1
  • Greedy: a football player in a red jersey is tackling another player in white who is tackling the ball.
  • BEAM Search, k=3: a football player in a red jersey is tackling another player in red who is running with the ball whilst fans watch.
  • BEAM Search, k=5: three football players are tackling a football player in a red and white uniform.
  • BEAM Search, k=7: an american footballer in a red and white uniform gets ready to tackle an opposing player.
  • BEAM Search, k=10: an american footballer in a red and white uniform gets ready to tackle an opposing player while fans watch.
Image 2
  • Greedy: a man in a red shirt climbing a rock.
  • BEAM Search, k=3: a man in a red shirt climbing a rock.
  • BEAM Search, k=5: a man climbing a rock.
  • BEAM Search, k=7: a man climbing a rock.
  • BEAM Search, k=10: a rock climber scales a steep rock cliff.

4. Procedure to Train Model

In token_path, img_path, train_path, test_path & glove_path variable add
the path of Flickr8k.token.txt, Flicker8k_Dataset, Flickr_8k.trainImages.txt,
Flickr_8k.testImages.txt & glove file respectively

example

token_path = '/content/drive/MyDrive/DS303/Flickr8k.token.txt'
img_path   = '/content/drive/MyDrive/DS303/Flicker8k_Dataset/'
train_path = '/content/drive/MyDrive/DS303/Flickr_8k.trainImages.txt'
test_path  = '/content/drive/MyDrive/DS303/Flickr_8k.testImages.txt'
glove_path = '/content/drive/MyDrive/DS303/glove.6B.200d.txt'

then run .py file in any preferable ide to train model, and if working on notebook run all cell to train and produce sample test result.

5. Procedure to Test on images

For testing any image from the test data set -

  • pick any image id of your choice from Flickr_8k.testImages.txt
  • encode the image using encoding_test function
image = encoding_test[pic].reshape((1,2048))
  • Now to get result
  1. using greddy search
greedySearch(image)
  1. Using Beam Search
beamSearch_predictions(image, beam_index = 3)

6. To View Result access the given link below

Link

  • Here you can find txt file of both greedy and beam search results
  • the txt file contain prediction across each imageID