Skip to content

stellar-gen-ai/stellar-dataset

Repository files navigation

Stellar Dataset

Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods

Authors: Panos Achlioptas, Alexandros Benetatos, Iordanis Fostiropoulos, Dimitris Skourtis

This is the official repository for the dataset of the paper Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods. The repo containts information on how to download and setup the Stellar dataset.

The codebase is maintained by Alexandros Benetatos. For any questions please reach out.

Dataset

The final Stellar dataset consists of 400 subjects identities with 2 images each (from CelebAMask-HQ) and 20.4k imaginative prompts. These are 10k prompts generated by humans (Stellar-H) and 10.4k prompts semi-automatically generated (Stellar-T). The dataset is split in test set and validation set. The test set contains 200 subjects with 100 unique prompts assigned to each (50 from Stellar-T and 50 from Stellar-H). The validation set contains 200 subjects with 2 prompts each from Stellar-T.

Each image is accompanied by a mask and an attributes file with the corresponding CelebA annotations.

Each of the 10.4k Stellar-T prompts is annotated with the corresponding detectable objects (e.g. car, apple etc.) and prompt categories (e.g. culinary, leisure, city, etc.).

Prompt Categories

There are two types of prompt categories. The ones related to the objects referenced in the prompt (e.g. city, vehicle, etc.) and the ones related to the main activity/action happening in the prompt (e.g. leisure, culinary, sports, arts, etc). The prompt categories are not mutually exclusive or exhaustive. For example, a prompt can be both culinary and leisure at the same time but any of the two might not be annotated. Additionally, the action category refers to the main action and not the context, e.g. driving a car in space would be annotated as everyday because driving a car is an everyday action, despite the context of being in space.

License

Before downloading or using any part of the code in this repository, please review and acknowledge the terms and conditions set forth in both the "License Terms" and "Third Party License Terms" included in this repository. Continuing to download and use any part of the code in this repository confirms you agree with these terms and conditions.

Preparation and Usage

Summary:

  1. Download CelebAMask-HQ Dataset
  2. Process CelebAMask-HQ Dataset
  3. Process Stellar Prompts
  4. Run

1. Download CelebAMask-HQ

Since the images used in Stellar are selected from CelebAMask-HQ, you need to download the dataset first. We use the link provided in the CelebA-Dialog Repo (use the link named CelebA-Dialog (HQ)) which is easier to download using gdown compared to the one provided in the CelebAMask-HQ Repo. To download the dataset in output_dir/STELLAR folder using gdown run the following command:

gdown --folder CELEBA_DIALOG_LINK -O output_dir/STELLAR

2. Process CelebAMask-HQ

To download the ids for the subset of CelebAMask-HQ dataset we use, you need to first accept the terms of use for Stellar Prompts Dataset here. You will then be provided with a download URL. After downloading, place celebahq_ids.txt in the ./images/ directory.

Assuming you have access to CelebAMask-HQ following the instruction on step 1, you can extract the images we use for our experiments with Stellar, by running the following command:

python scripts/extract_stellar_from_celebamaskhq.py --dataset-dir output_dir/STELLAR/

Where output_dir/STELLAR/ is the path to the directory where you downloaded CelebAMask-HQ. This script will extract the images and masks from CelebAMask-HQ and place them in the output_dir/STELLAR/ directory.

After extracting the images used in Stellar, the directory structure should be:

.
├── ...
├── output_dir                           # The datasets folder
│   ├── STELLAR                          # The Stellar dataset folder
│   │   ├── 000                          # Zeroth subject folder
│   │   │   ├── 0.jpg                    # First image file
│   │   │   ├── 0_bg.png                 # First image mask file
│   │   │   ├── 0_attributes.json        # First image celeba annotations
│   │   │   ├── 1.jpg                    # Second image file
│   │   │   ├── 1_bg.png                 # Second image mask file
│   │   │   ├── 1_attributes.json        # Second image celeba annotations
│   │   │ ...
└── ...

Note that the first 200 subjects correspond to the test split while the next 200 subjects correspond to the validation split.

2. Process Prompts

To download the prompts you need to first accept the terms of use for Stellar Prompts Dataset here. You will then be provided with a download URL.

To see in details how we build our prompt dataset and associated to the underlying human identities, please visit our suplemental.

After downloading, place objects.txt, stellar_t.json and stellar_h.json in the ./prompts/ directory.

Place Stellar Prompts With The Dataset

After completing the above steps, you would need to run the place_prompts_with_dataset.py script from the scripts folder to add the prompts of each Stellar subject's identity in the corresponding folder:

python scripts/place_prompts_with_dataset.py --dataset-dir output_dir/STELLAR/

Where output_dir/STELLAR/ is the path to the directory where the rest of the dataset is.

After doing all the above steps, the final directory structure for Stellar dataset should be:

.
├── ...
├── output_dir                           # The datasets folder
│   ├── STELLAR                          # The Stellar dataset folder
│   │   ├── 000                          # Zeroth subject folder
│   │   │   ├── 0.jpg                    # First image file
│   │   │   ├── 0_bg.png                 # First image mask file
│   │   │   ├── 0_attributes.json        # First image celeba annotations
│   │   │   ├── 1.jpg                    # Second image file
│   │   │   ├── 1_bg.png                 # Second image mask file
│   │   │   ├── 1_attributes.json        # Second image celeba annotations
│   │   │   ├── prompts_h.json           # Prompts-H file (only on test split)
│   │   │   ├── prompts_t.json           # Prompts-T file
│   │   │ ...
└── ...

3. Run

To install the dataset class

pip install git+https://github.com/stellar-gen-ai/stellar-dataset.git

To use the dataset you can simply run:

from stellar_dataset import Stellar

dataset = Stellar(dataset_dir)

Where dataset_dir is the directory of STELLAR dataset from Download Instructions

Citation

If you use this work please cite:

@article{stellar2023,
  author    = {Achlioptas, Panos and Benetatos, Alexandros and Fostiropoulos, Iordanis and Skourtis, Dimitris},
  title     = {Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods},
  volume    = {abs/2312.06116},
  journal   = {Computing Research Repository (CoRR)},
  year      = {2023},
}

About

Official Code for the dataset exploration of Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published