Skip to content

ml-research/semantic-image-editing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Guidance for Diffusion

Official Implementation of the Paper SEGA: Instructing Diffusion using Semantic Dimensions.

You may find the implementation of the previous pre-print The Stable Artist: Interacting with Concepts in Diffusion Latent Space under the tag StableArtist.

Interactive Demo

An interactive demonstration is available in Colab and on Huggingface Open In Colab Huggingface Spaces

Examples

Installation

SEGA is fully integrated in the diffusers library as SemanticStableDiffusionPipeline. Just install diffusers to use it:

pip install diffusers

Alternatively you can clone this repository and install it locally by running

git clone https://github.com/ml-research/semantic-image-editing.git
cd ./semantic-image-editing
pip install .

or install it directly from git

pip install git+https://github.com/ml-research/semantic-image-editing.git

Usage

This repository provides a new diffusion pipeline supporting semantic image editing based on the diffusers library. The SemanticEditPipeline extends the StableDiffusionPipeline and can therefore be loaded from a stable diffusion checkpoint like shown below.

from semdiffusers import SemanticEditPipeline
device='cuda'

pipe = SemanticEditPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
).to(device)

or load the corresponding pipeline in diffusers:

from diffusers import SemanticStableDiffusionPipeline
device = 'cuda'
pipe = SemanticStableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
).to(device)

An exemplary usage of the pipeline could look like this:

import torch
gen = torch.Generator(device=device)

gen.manual_seed(21)
out = pipe(prompt='a photo of the face of a woman', generator=gen, num_images_per_prompt=1, guidance_scale=7,
           editing_prompt=['smiling, smile',       # Concepts to apply 
                           'glasses, wearing glasses', 
                           'curls, wavy hair, curly hair', 
                           'beard, full beard, mustache'],
           reverse_editing_direction=[False, False, False, False], # Direction of guidance i.e. increase all concepts
           edit_warmup_steps=[10, 10, 10,10], # Warmup period for each concept
           edit_guidance_scale=[4, 5, 5, 5.4], # Guidance scale for each concept
           edit_threshold=[0.99, 0.975, 0.925, 0.96], # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
           edit_momentum_scale=0.3, # Momentum scale that will be added to the latent guidance
           edit_mom_beta=0.6, # Momentum beta
           edit_weights=[1,1,1,1,1] # Weights of the individual concepts against each other
          )
images = out.images

Citation

If you like or use our work please cite us:

@article{brack2023Sega,
      title={SEGA: Instructing Diffusion using Semantic Dimensions}, 
      author={Manuel Brack and Felix Friedrich and Dominik Hintersdorf and Lukas Struppek and Patrick Schramowski and Kristian Kersting},
      year={2023},
      journal={NeurIPS}
}