CARETS

Download image files

All the images used in CARETS come from the GQA validation set. If you've already downloaded the GQA dataset, you may set the images_root element in the dataset config to be the GQA images directory. Otherwise, you have two options: 1) download all the images for the GQA dataset from here (20GB) or 2) download just the subset of images that we use with the script below (1.3GB).

cd CARETS

export DATADIR=data  # where to store images directory
export TARNAME=images.tar.gz

wget --save-cookies pbbxvrf.txt 'https://drive.google.com/uc?id=1Yi_Zgbn0rraekBV96Vwmg9kOuv72b1Lt&export=download' -O- \
     | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1/p' > pbasvez.txt && \
wget --load-cookies pbbxvrf.txt -O $TARNAME \
     'https://drive.google.com/uc?id=1Yi_Zgbn0rraekBV96Vwmg9kOuv72b1Lt&export=download&confirm='$(<pbasvez.txt) && \
tar -xzf $TARNAME -C $DATADIR

rm -f pbbxvrf.txt pbasvez.txt $TARNAME

Dataset Config

Datasets are defined by YAML configuration using the following basic format:

images_root: data/images/  # *.jpg image files
files_root: data/questions/  # *.json files
tests:
  rephrasing_invariance:
    eval_type: invariance  # (invariance | directional_expectation)
    files:
      - rephrasing_file_1.json  # filename located in data/questions/
      ...
  ...

The configuration for the default evaluation for non-visual perturbations can be found under configs/defaults.yml

Usage of CaretsDataset

A CaretsDataset object is a collection of tests ingested using the configuration file. Each test corresponds to a question split with pairs of questions and evaluates a particular type of capability (e.g. rephrasing invariance or negation directional expectation). The CaretsDataset object can be used to iterate over the questions and their metadata, including the image id and image_path.

Note: we shall soon introduce a TorchCaretsDataset that will be more easily compatible with PyTorch DataLoaders.

import random
from carets import CaretsDataset

dataset = CaretsDataset('./configs/default.yml')
predictions = dict()

for test_name, split in dataset.splits:
    for question in split:
        question_id = question['question_id']
        img_path = question['image_path']
        question = question['sent']
        predictions[question_id] = random.choice(['cat', 'yes', 'no', 'red'])
        
for test_name, split in dataset.splits:
    accuracy = split.total_accuracy(predictions)
    consistency = split.evaluate(predictions)
    comprehensive_accuracy = split.comprehensive_accuracy(predictions)
    eval_type = split.eval_type
    print(f'{test_name.ljust(24)}: accuracy: {accuracy:.3f}, {eval_type.ljust(24)}:' + \
          f' {consistency:.3f}, comprehensive_accuracy: {comprehensive_accuracy:.3f}')

Extending Carets

Coming soon...

Citation

@inproceedings{jimenez2022carets,
   title={CARETS: A Consistency And Robustness Evaluative Test Suite for VQA},
   author={Carlos E. Jimenez and Olga Russakovsky and Karthik Narasimhan},
   booktitle={60th Annual Meeting of the Association for Computational Linguistics (ACL)},
   year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
carets		carets
configs		configs
data/questions		data/questions
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

carets

carets

configs

configs

data/questions

data/questions

.gitignore

.gitignore

README.md

README.md

Repository files navigation

CARETS

Download image files

Dataset Config

Usage of CaretsDataset

Extending Carets

Citation

About

Releases

Packages

Languages

princeton-nlp/CARETS

Folders and files

Latest commit

History

Repository files navigation

CARETS

Download image files

Dataset Config

Usage of CaretsDataset

Extending Carets

Citation

About

Resources

Stars

Watchers

Forks

Languages