This is the implementation of the approaches described in the paper:
Emanuele Bugliarello and Desmond Elliott. The Role of Syntactic Planning in Compositional Image Captioning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, April 2021.
We provide the code for reproducing our results, processed data and pretrained models.
You can clone this repository with submodules included issuing:
git clone git@github.com:e-bug/syncap.git
The requirements can be installed by setting up a conda environment:
conda env create -f environment.yml
followed by source activate syncap
To set up the syntactic taggers, run bash setup_environment.sh
.
Finally, install the environments for M2-Transformer and Improved BERTScore to use them.
Check out data/README.md
for links to preprocessed data and data preparation steps.
We also distribute our final trained models.
Scripts for training and evaluating each model are provided in the corresponding experiments/
directory
(e.g., experiments/coco_heldout_1_pos_inter/butr_weight/train.sh
).
We also provide SLURM wrappers that call the corresponding bash files (e.g., train.cluster
).
In particular:
train.sh
: trains a modelval.sh
: generates captions for the validation setscore_val.sh
: computes R@5 for compositional generalization and the standard COCO metrics for the generated captions in the validation setbertscore_val.sh
: computes the Improved BERTScore for the generated captions in the validation set (Yi et al., 2020)rank_val.sh
: computes image--text retrieval performance for the generated captions in the validation set (ranking models only)diversity_val.sh
: measures diversity metrics for the captions generated in the validation set (van Miltenburg et al., 2018)
code/
eval.py
: Generate captions for a given COCO splitevalrank.py
: Evaluate image--text retrieval performance of ranking modulesoptions.py
: Hyper-parameters that can be used by each model during trainingtag_results.py
: Annotate captions with specified type of syntactic tagstrain.py
: Train captioning models
data/
: Concept pairs data and data preprocessing (scripts and download links)experiments/
: Results for each model we trained and scripts to reproduce themnotebooks/
: iPython notebooks to analyze trained modelstools/
: Third-party software (Improved BERTScore)
This work is licensed under the MIT license. See LICENSE
for details.
Third-party software and data sets are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper:
@inproceedings{bugliarello-elliott-2021-role,
title = "The Role of Syntactic Planning in Compositional Image Captioning",
author = "Bugliarello, Emanuele and
Elliott, Desmond",
booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
month = apr,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.eacl-main.48",
pages = "593--607",
}
Our code builds on top of the following excellent repositories: