Implementing image captioning with and without soft attention model on the Flickr8k dataset.
This is my implementation of the Show, Attend and Tell paper.
Taken assistance from the blogpost: https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/
You can see my implementation at this Kaggle kernel. The attention model was not successfully implemented, which is why I trained my model without it.
The highest BLEU scores after 20 epochs were:
BLEU-1: 53.0076%
BLEU-2: 28.6551%
BLEU-3: 19.7607%
BLEU-4: 9.4241%
This is the first implementation and will be optimized further.