Semantic Guidance for Diffusion

Official Implementation of the Paper SEGA: Instructing Diffusion using Semantic Dimensions.

You may find the implementation of the previous pre-print The Stable Artist: Interacting with Concepts in Diffusion Latent Space under the tag StableArtist.

Interactive Demo

An interactive demonstration is available in Colab and on Huggingface

Installation

SEGA is fully integrated in the diffusers library as SemanticStableDiffusionPipeline. Just install diffusers to use it:

pip install diffusers

Alternatively you can clone this repository and install it locally by running

git clone https://github.com/ml-research/semantic-image-editing.git
cd ./semantic-image-editing
pip install .

or install it directly from git

pip install git+https://github.com/ml-research/semantic-image-editing.git

Usage

This repository provides a new diffusion pipeline supporting semantic image editing based on the diffusers library. The SemanticEditPipeline extends the StableDiffusionPipeline and can therefore be loaded from a stable diffusion checkpoint like shown below.

from semdiffusers import SemanticEditPipeline
device='cuda'

pipe = SemanticEditPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
).to(device)

or load the corresponding pipeline in diffusers:

from diffusers import SemanticStableDiffusionPipeline
device = 'cuda'
pipe = SemanticStableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
).to(device)

An exemplary usage of the pipeline could look like this:

import torch
gen = torch.Generator(device=device)

gen.manual_seed(21)
out = pipe(prompt='a photo of the face of a woman', generator=gen, num_images_per_prompt=1, guidance_scale=7,
           editing_prompt=['smiling, smile',       # Concepts to apply 
                           'glasses, wearing glasses', 
                           'curls, wavy hair, curly hair', 
                           'beard, full beard, mustache'],
           reverse_editing_direction=[False, False, False, False], # Direction of guidance i.e. increase all concepts
           edit_warmup_steps=[10, 10, 10,10], # Warmup period for each concept
           edit_guidance_scale=[4, 5, 5, 5.4], # Guidance scale for each concept
           edit_threshold=[0.99, 0.975, 0.925, 0.96], # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
           edit_momentum_scale=0.3, # Momentum scale that will be added to the latent guidance
           edit_mom_beta=0.6, # Momentum beta
           edit_weights=[1,1,1,1,1] # Weights of the individual concepts against each other
          )
images = out.images

Citation

If you like or use our work please cite us:

@article{brack2023Sega,
      title={SEGA: Instructing Diffusion using Semantic Dimensions}, 
      author={Manuel Brack and Felix Friedrich and Dominik Hintersdorf and Lukas Struppek and Patrick Schramowski and Kristian Kersting},
      year={2023},
      journal={NeurIPS}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
examples		examples
src/semdiffusers		src/semdiffusers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

src/semdiffusers

src/semdiffusers

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Semantic Guidance for Diffusion

Interactive Demo

Installation

Usage

Citation

About

Releases 1

Packages

Contributors 2

Languages

License

ml-research/semantic-image-editing

Folders and files

Latest commit

History

Repository files navigation

Semantic Guidance for Diffusion

Interactive Demo

Installation

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Languages