Best Prompts for Text-to-Image Models and How to Find Them

This repository contains code and data for Best Prompts for Text-to-Image Models and How to Find Them paper.

Code

To run the prompt optimization, you need to create a class that gets image generation queries and returns images generated with Stable Diffusion. We use the following interface:

class DiffusionApi:
    def generate(self, prompt, steps=50, scale=7.5, seed=0, height=512, width=512):
        pass

Here you pass the prompt string, the number of steps, guidance scale, seed number, and shape of the image, and the generate function returns a Pillow Image object.

To run a genetic optimization, you need to use optimize.py script that has the following arguments:

--toloka-token. Toloka API token
--aws-access-key-id. AWS secret key ID. Here we use it to store generated images in an AWS bucket to get direct links that will be embedded into annotation tasks
--aws-secret-access-key. AWS secret access key
--endpoint-url. Base URL at which your images will be stored. In other words, a URL to your AWS bucket
--bucket. Name of the AWS bucket
--base-pool-id. An ID of configured Toloka pool that will be cloned on every optimization iteration

Data

The annotation.csv contains a CSV file with the results of the pairwise comparisons. It has five columns: prompt_id is an ID of an image description, left_uid is a UID of four left images (more details below), right_uid is a UID of four right images, worker is the worker's ID, and label is a worker's preference (left or right)
The uid_to_keywords.csv contains a mapping of image UID to keywords that it was obtained with
prompts.csv contains image descriptions. Here index of a prompt is prompt_id in annotation.csv
keywords.csv contains keywords and their occurrences in the Stable Diffusion Discord

To obtain generated images, you need to use https://storage.yandexcloud.net/diffusion/ as a base URL and append {UID}_{0-3}.png to it. For example,

https://storage.yandexcloud.net/diffusion/0000298d546d4a6299774ca323fa7f34_0.png

Cite

Pavlichenko, N., Ustalov, D.: Best Prompts for Text-to-Image Models and How to Find Them. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 2067–2071. Association for Computing Machinery, Taipei, Taiwan (2023). https://doi.org/10.1145/3539618.3592000

@inproceedings{Pavlichenko:23,
  author    = {Pavlichenko, Nikita and Ustalov, Dmitry},
  title     = {{Best Prompts for Text-to-Image Models and How to Find Them}},
  year      = {2023},
  booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  series    = {SIGIR '23},
  pages     = {2067--2071},
  address   = {Taipei, Taiwan},
  publisher = {Association for Computing Machinery},
  doi       = {10.1145/3539618.3592000},
  isbn      = {978-1-4503-9408-6},
  eprint    = {2209.11711},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  language  = {english},
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE-APACHE.txt		LICENSE-APACHE.txt
LICENSE-CC-BY.txt		LICENSE-CC-BY.txt
README.md		README.md
annotation.csv		annotation.csv
comp_hp.csv		comp_hp.csv
keywords.csv		keywords.csv
optimize.py		optimize.py
prompts.csv		prompts.csv
uid_to_keywords.csv		uid_to_keywords.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

.gitignore

.gitignore

CITATION.cff

CITATION.cff

LICENSE-APACHE.txt

LICENSE-APACHE.txt

LICENSE-CC-BY.txt

LICENSE-CC-BY.txt

README.md

README.md

annotation.csv

annotation.csv

comp_hp.csv

comp_hp.csv

keywords.csv

keywords.csv

optimize.py

optimize.py

prompts.csv

prompts.csv

uid_to_keywords.csv

uid_to_keywords.csv

Repository files navigation

Best Prompts for Text-to-Image Models and How to Find Them

Code

Data

Cite

About

Licenses found

Contributors 2

Languages

License

Licenses found

Toloka/BestPrompts

Folders and files

Latest commit

History

Repository files navigation

Best Prompts for Text-to-Image Models and How to Find Them

Code

Data

Cite

About

Resources

License

Licenses found

Security policy

Stars

Watchers

Forks

Languages