- Introduction
- Prerequisites
- Data Collection
Just prior to the recent development of Deep Neural Networks image captioning was inconceivable even by the most advanced researchers in Computer Vision. But with the advent of Deep Learning this problem can be solved very easily if we have the required dataset.Few applications where a solution to this problem can be very useful are Self Driving Car, Aid to blind.
- Basic Deep Learning concepts like Multi-layered Perceptrons, Convolution Neural Networks, Recurrent Neural Networks, Transfer Learning, Gradient Descent, Backpropagation, Overfitting, Probability, Text Processing, Python syntax and data structures, Keras library, etc.
We will be using Flicker8k dataset which is downloaded from kaggle. Depending upon computation power present other dataset of flicker cab be used like Flicker30k This dataset contains 8000 images each with 5 captions These images are bifurcated as follows:
- Training Set — 6000 images
- Dev Set — 1000 images
- Test Set — 1000 images