Related to:
Visual story telling
Contrastive captioning (with inferred reference)
Relevant papers:
DIVERSE BEAM SEARCH: DECODING DIVERSE SOLUTIONS FROM NEURAL SEQUENCE MODELS (https://arxiv.org/pdf/1610.02424v2.pdf) (https://github.com/ashwinkalyan/dbs)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (https://arxiv.org/pdf/1707.07998v3.pdf) (https://github.com/facebookresearch/pythia)