Skip to content

princeton-nlp/CARETS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CARETS

Download image files

All the images used in CARETS come from the GQA validation set. If you've already downloaded the GQA dataset, you may set the images_root element in the dataset config to be the GQA images directory. Otherwise, you have two options: 1) download all the images for the GQA dataset from here (20GB) or 2) download just the subset of images that we use with the script below (1.3GB).

cd CARETS

export DATADIR=data  # where to store images directory
export TARNAME=images.tar.gz

wget --save-cookies pbbxvrf.txt 'https://drive.google.com/uc?id=1Yi_Zgbn0rraekBV96Vwmg9kOuv72b1Lt&export=download' -O- \
     | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1/p' > pbasvez.txt && \
wget --load-cookies pbbxvrf.txt -O $TARNAME \
     'https://drive.google.com/uc?id=1Yi_Zgbn0rraekBV96Vwmg9kOuv72b1Lt&export=download&confirm='$(<pbasvez.txt) && \
tar -xzf $TARNAME -C $DATADIR

rm -f pbbxvrf.txt pbasvez.txt $TARNAME

Dataset Config

Datasets are defined by YAML configuration using the following basic format:

images_root: data/images/  # *.jpg image files
files_root: data/questions/  # *.json files
tests:
  rephrasing_invariance:
    eval_type: invariance  # (invariance | directional_expectation)
    files:
      - rephrasing_file_1.json  # filename located in data/questions/
      ...
  ...

The configuration for the default evaluation for non-visual perturbations can be found under configs/defaults.yml

Usage of CaretsDataset

A CaretsDataset object is a collection of tests ingested using the configuration file. Each test corresponds to a question split with pairs of questions and evaluates a particular type of capability (e.g. rephrasing invariance or negation directional expectation). The CaretsDataset object can be used to iterate over the questions and their metadata, including the image id and image_path.

Note: we shall soon introduce a TorchCaretsDataset that will be more easily compatible with PyTorch DataLoaders.

import random
from carets import CaretsDataset

dataset = CaretsDataset('./configs/default.yml')
predictions = dict()

for test_name, split in dataset.splits:
    for question in split:
        question_id = question['question_id']
        img_path = question['image_path']
        question = question['sent']
        predictions[question_id] = random.choice(['cat', 'yes', 'no', 'red'])
        
for test_name, split in dataset.splits:
    accuracy = split.total_accuracy(predictions)
    consistency = split.evaluate(predictions)
    comprehensive_accuracy = split.comprehensive_accuracy(predictions)
    eval_type = split.eval_type
    print(f'{test_name.ljust(24)}: accuracy: {accuracy:.3f}, {eval_type.ljust(24)}:' + \
          f' {consistency:.3f}, comprehensive_accuracy: {comprehensive_accuracy:.3f}')

Extending Carets

Coming soon...


Citation

@inproceedings{jimenez2022carets,
   title={CARETS: A Consistency And Robustness Evaluative Test Suite for VQA},
   author={Carlos E. Jimenez and Olga Russakovsky and Karthik Narasimhan},
   booktitle={60th Annual Meeting of the Association for Computational Linguistics (ACL)},
   year={2022}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages