Spawrious

One-to-one Spurious Correlations

Many-to-many Spurious Correlations

Spawrious is a challenging OOD image classification benchmark (link to paper). It consists of 6 separate OOD challenges split into two types: one-to-one and many-to-many spurious correlation challenges.

The dataset contains images of 4 dog breeds, found in 6 locations. The entire dataset consists of ~152,000 images, but each challenge only requires a subset of this. As a result, the repo allows users to only download the mimimal dataset required for a given spawrious challenge.

Example script

Datasets take the following names:

entire_dataset
o2o_easy
o2o_medium
o2o_hard
m2m_easy
m2m_medium
m2m_hard

Running the command below retrieves the appropriate dataset at a user specified user directory (and downloads the dataset if not available), trains a resnet18, and evaluates the results on the OOD test set.

python example.py --data_dir <path to data dir> --dataset <one of the list above>

Installation

pip install git+https://github.com/aengusl/spawrious.git

HParams

Using the datasets

from spawrious.torch import get_spawrious_dataset
# spawrious.tf if using tensorflow or jax

dataset = "m2m_medium"
data_dir = ".data/"
val_split = 0.2

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
spawrious = get_spawrious_dataset(dataset_name=dataset, root_dir=data_dir)
train_set = spawrious.get_train_dataset()
test_set = spawrious.get_test_dataset()
val_size = int(len(train_set) * val_split)
train_set, val_set = torch.utils.data.random_split(
    train_set, [len(train_set) - val_size, val_size]
)

Click to download the datasets:

Generate your own data

If you want to generate your own data, or understand how we generated ours, take a look at generate_dataset.py. To run this file, you additionally need to install diffusers and transformers.

Citation

@misc{lynch2023spawrious,
      title={Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases}, 
      author={Aengus Lynch and Gbètondji J-S Dovonon and Jean Kaddour and Ricardo Silva},
      year={2023},
      eprint={2303.05470},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Licensing

Shield:

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
spawrious		spawrious
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
generate_dataset.py		generate_dataset.py
overview.png		overview.png
requirements.txt		requirements.txt
setup.py		setup.py
twitter_gif_preview_m2m_HQ.gif		twitter_gif_preview_m2m_HQ.gif
twitter_gif_preview_o2o_HQ.gif		twitter_gif_preview_o2o_HQ.gif

License

aengusl/spawrious

Folders and files

Latest commit

History

Repository files navigation

Spawrious

One-to-one Spurious Correlations

Many-to-many Spurious Correlations

Example script

Installation

HParams

Using the datasets

Click to download the datasets:

Generate your own data

Citation

Licensing

About

Resources

License

Stars

Watchers

Forks

Languages