Data Setup

We provide access to our preprocessed data and preprocessing scripts to replicate our setup.

Preprocessed Data

coco2014
compgen

Preprocessing Steps

prepare_data.sh provides sample execution for data preparation. It calls prepare_data.py. Note that the output corpus directory might look different than the final one (see below).

The corpus directory looks as follows:

corpus/
 |-- coco2014/
 |    |-- annotations/
 |    |    |-- captions_train2014.json
 |    |    |-- captions_val2014.json
 |    |    |-- instances_train2014.json
 |    |    |-- instances_val2014.json
 |    |-- images/
 |    |    |-- coco_detections.hdf5 (used by M2-TRM)
 |    |    |-- image_features.hdf5
 |    |    |-- images.hdf5
 |
 |-- compgen/
 |    |-- annotations/
 |    |    |-- captions.json
 |    |    |-- ccg_captions.json
 |    |    |-- chunk_captions.json
 |    |    |-- dep_captions.json
 |    |    |-- head_captions.json
 |    |    |-- idle_captions.json
 |    |    |-- lemma_captions.json
 |    |    |-- pos_captions.json
 |    |-- concepts/
 |    |    |-- occurrences/
 |    |    |    |-- big_bird.json
 |    |    |    |-- big_cat.json
 |    |    |    |-- ...
 |    |    |    |-- white_truck.json
 |    |    |-- synonyms/
 |    |    |    |-- adjectives/
 |    |    |    |    |-- big.json
 |    |    |    |    |-- black.json
 |    |    |    |    |-- ...
 |    |    |    |    |-- young.json
 |    |    |    |-- nouns/
 |    |    |    |-- verbs/
 |    |-- datasets/
 |    |    |-- coco_heldout_1/
 |    |    |    |-- captions.json  
 |    |    |    |-- dataset_splits.json  
 |    |    |    |-- encoded_captions.json  
 |    |    |    |-- word_map.json
 |    |    |-- coco_heldout_1_ccg_inter/
 |    |    |-- coco_heldout_1_ccg_multi/
 |    |    |-- coco_heldout_1_ccg_seq/
 |    |    |-- coco_heldout_1_chunk_inter/
 |    |    |-- ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Setup

Preprocessed Data

Preprocessing Steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Setup

Preprocessed Data

Preprocessing Steps