CLEVR-XAI: A Benchmark Dataset for Evaluating XAI

CLEVR-XAI aims to provide a benchmark dataset for the quantitative evaluation of XAI explanations (aka heatmaps) in computer vision.

It is made of visual question answering (VQA) questions, which are derived from the original CLEVR task, and where each question is accompanied by several ground truth (GT) masks that can be used as a realistic, selective and controlled testbed for the evaluation of heatmaps on the input image.

CLEVR-XAI was introduced in an Information Fusion paper. Furthermore in this paper several XAI methods were tested against the CLEVR-XAI benchmark, in particular Layer-wise Relevance Propagation (LRP), Integrated Gradients, Guided Backprop, Guided Grad-CAM, SmoothGrad, VarGrad, Gradient, Gradient×Input, Deconvnet and Grad-CAM.

The CLEVR-XAI dataset consists of 39,761 simple questions (CLEVR-XAI-simple) and 100,000 complex questions (CLEVR-XAI-complex), which are based on the same underlying set of 10,000 images (i.e., there are approx. 4 simple questions and 10 complex questions per image).

CLEVR-XAI-simple contains the following Ground Truths:

GT Single Object (for all questions)
GT All Objects (for all questions)

CLEVR-XAI-complex contains the following Ground Truths:

GT Unique (for 89,873 questions)
GT Unique First-non-empty (for 99,786 questions)
GT Union (for 99,786 questions)
GT All Objects (for all questions)

Note: For some complex questions a few GT masks are unavailable, since for these questions the masks are undefined/empty.

_{CLEVR-XAI-simple}	_Image	_{GT Single Object}	_{GT All Objects}
_{What is the small yellow sphere made of? metal}
_LRP	_{Integrated Gradients}	_{Guided Backprop}	_Grad-CAM

_{CLEVR-XAI-complex}	_Image	_{GT Unique}	_{GT Unique First-non-empty}
_{Is there any other thing that has the same size as the shiny sphere? yes}
_LRP	_{Integrated Gradients}	_{Guided Backprop}	_Grad-CAM

For more details on the definition of each GT please refer to the paper. More broadly, note that simple questions always contain one target object for the VQA question, and complex questions can have several objects involved in the VQA question.

I. Dataset Download

The dataset can be downloaded from the releases section of this repository.

II. Dataset Generation

For the sake of completeness and to promote future research, we additionally provide the code to generate the CLEVR-XAI dataset. Note that if you are only interested in using the released version of our dataset you don't need to re-generate the dataset yourself and can directly download it here, thus you can skip the following dataset generation steps.

Our code to generate CLEVR-XAI is built upon the original CLEVR generator.

To limit the amount of prerequisites, all our generation steps run inside containers with Singularity. So Singularity is the only requirement to run the code. Here is a Singularity quick start guide.

Step 1: Image Generation

Please refer to the README in the image_generation folder.

Step 2: Question Generation

Please refer to the README in the question_generation folder.

Step 3: Ground Truth Masks Generation

Please refer to the README in the eval folder.

This last step also includes the resizing of the masks, which can be useful in case your model takes input images of a different size than the CLEVR images (the original CLEVR images have size 320x480).

In our released version of the CLEVR-XAI benchmark dataset, the masks were resized to the size 128x128 (since the Relation Network model we use for the evaluation of XAI methods takes input images of size 128x128), see our paper Appendix D for more details on this step.

III. Heatmap Generation

The code to generate heatmaps on a Relation Network model which was trained on the original CLEVR dataset, and which was used to evaluate different XAI methods w.r.t. our CLEVR-XAI benchmark dataset as done in the paper, will be made publicly available (admittedly with some delay but it will be released).

IV. Heatmap Evaluation

The code to evaluate heatmaps is currently available as a stand-alone gist.

(In the future we may automatize this step and integrate it in the eval folder of this repository for more convenience.)

Citation

If you find our dataset or code useful, please cite our paper:

@article{Arras_etal:2022,
    title     = {{CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations}},
    author    = {Leila Arras and Ahmed Osman and Wojciech Samek},
    journal   = {Information Fusion},
    volume    = {81},
    pages     = {14-40},
    year      = {2022},
    url       = {https://doi.org/10.1016/j.inffus.2021.11.008}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
eval		eval
image_generation		image_generation
images		images
question_generation		question_generation
.gitignore		.gitignore
README.md		README.md
clevr-dataset-gen-LICENSE		clevr-dataset-gen-LICENSE
run-utils.sh		run-utils.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval

eval

image_generation

image_generation

images

images

question_generation

question_generation

.gitignore

.gitignore

README.md

README.md

clevr-dataset-gen-LICENSE

clevr-dataset-gen-LICENSE

run-utils.sh

run-utils.sh

Repository files navigation

CLEVR-XAI: A Benchmark Dataset for Evaluating XAI

I. Dataset Download

II. Dataset Generation

Step 1: Image Generation

Step 2: Question Generation

Step 3: Ground Truth Masks Generation

III. Heatmap Generation

IV. Heatmap Evaluation

Citation

About

Releases 2

Packages

Contributors 2

Languages

ahmedmagdiosman/clevr-xai

Folders and files

Latest commit

History

Repository files navigation

CLEVR-XAI: A Benchmark Dataset for Evaluating XAI

I. Dataset Download

II. Dataset Generation

Step 1: Image Generation

Step 2: Question Generation

Step 3: Ground Truth Masks Generation

III. Heatmap Generation

IV. Heatmap Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Languages