Skip to content

Releases: huggingface/diffusers

v0.8.0: Versatile Diffusion - Text, Images and Variations All in One Diffusion Model

23 Nov 18:23
Compare
Choose a tag to compare

🙆‍♀️ New Models

VersatileDiffusion

VersatileDiffusion, released by SHI-Labs, is a unified multi-flow multimodal diffusion model that is capable of doing multiple tasks such as text2image, image variations, dual-guided(text+image) image generation, image2text.

pip install git+https://github.com/huggingface/transformers

Then you can run:

from diffusers import VersatileDiffusionPipeline
import torch
import requests
from io import BytesIO
from PIL import Image

pipe = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# initial image
url = "https://huggingface.co/datasets/diffusers/images/resolve/main/benz.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")

# prompt
prompt = "a red car"

# text to image
image = pipe.text_to_image(prompt).images[0]

# image variation
image = pipe.image_variation(image).images[0]

# image variation
image = pipe.dual_guided(prompt, image).images[0]

More in-depth details can be found on:

AltDiffusion

AltDiffusion is a multilingual latent diffusion model that supports text-to-image generation for 9 different languages: English, Chinese, Spanish, French, Japanese, Korean, Arabic, Russian and Italian.

Stable Diffusion Image Variations

StableDiffusionImageVariationPipeline by @justinpinkney is a stable diffusion model that takes an image as an input and generates variations of that image. It is conditioned on CLIP image embeddings instead of text.

Safe Latent Diffusion

Safe Latent Diffusion (SLD), released by ml-research@TUDarmstadt group, is a new practical and sophisticated approach to prevent unsolicited content from being generated by diffusion models. One of the authors of the research contributed their implementation to diffusers.

VQ-Diffusion with classifier-free sampling

vq diffusion classifier free sampling by @williamberman #1294

LDM super resolution

LDM super resolution is a latent 4x super-resolution diffusion model released by CompVis.

CycleDiffusion

CycleDiffusion is a method that uses Text-to-Image Diffusion Models for Image-to-Image Editing. It is capable of

  1. Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion.
    Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
  2. Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion.
    Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
  • Add CycleDiffusion pipeline using Stable Diffusion by @ChenWu98 #888

CLIPSeg + StableDiffusionInpainting.

Uses CLIPSeg to automatically generate a mask using segmentation, and then applies Stable Diffusion in-painting.

K-Diffusion wrapper

K-Diffusion Pipeline is community pipeline that allows to use any sampler from K-diffusion with diffusers models.

🌀New SOTA Scheduler

DPMSolverMultistepScheduler is the 🧨 diffusers implementation of DPM-Solver++, a state-of-the-art scheduler that was contributed by one of the authors of the paper. This scheduler is able to achieve great quality in as few as 20 steps. It's a drop-in replacement for the default Stable Diffusion scheduler, so you can use it to essentially half generation times. It works so well that we adopted it for the Stable Diffusion demo Spaces: https://huggingface.co/spaces/stabilityai/stable-diffusion, https://huggingface.co/spaces/runwayml/stable-diffusion-v1-5.

You can use it like this:

from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repo_id = "runwayml/stable-diffusion-v1-5"
scheduler = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler")
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)

🌐 Better scheduler API

The example above also demonstrates how to load schedulers using a new API that is coherent with model loading and therefore more natural and intuitive.

You can load a scheduler using from_pretrained, as demonstrated above, or you can instantiate one from an existing scheduler configuration. This is a way to replace the scheduler of a pipeline that was previously loaded:

from diffusers import DiffusionPipeline, EulerDiscreteScheduler

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

Read more about these changes in the documentation. See also the community pipeline that allows using any of the K-diffusion samplers with diffusers, as mentioned above!

🎉 Performance

We work relentlessly to incorporate performance optimizations and memory reduction techniques to 🧨 diffusers. These are two of the most noteworthy incorporations in this release:

  • Enable memory-efficient attention by default if xFormers is installed.
  • Use batched-matmuls when possible.

🎁 Quality of Life improvements

  • Fix/Enable all schedulers for in-painting
  • Easier loading of local pipelines
  • cpu offloading: mutli GPU support

📝 Changelog

Read more

v0.7.2: Patch release

05 Nov 21:45
Compare
Choose a tag to compare

This patch release fixes a bug that broken the Flax Stable Diffusion Inference.
Thanks a mille for spotting it @camenduru in #1145 and thanks a lot to @pcuenca and @kashif for fixing it in #1149

v0.7.1: Patch release

04 Nov 14:22
Compare
Choose a tag to compare

This patch release makes accelerate a soft dependency to avoid an error when installing diffusers with pre-existing torch.

v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community

03 Nov 18:44
Compare
Choose a tag to compare

❤️ PyTorch + Accelerate

⚠️ The PyTorch pipelines now require accelerate for improved model loading times!
Install Diffusers with pip install --upgrade diffusers[torch] to get everything in a single command.

🍎 Apple Silicon support with PyTorch 1.13

PyTorch and Apple have been working on improving mps support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!

Requirements

  • Mac computer with Apple silicon (M1/M2) hardware.
  • macOS 12.6 or later (13.0 or later recommended, as support is even better).
  • arm64 version of Python.
  • PyTorch 1.13.0 official release, installed from pip or the conda channels.

Memory efficient generation

Memory management is crucial to achieve fast generation speed. We recommend to always use attention slicing on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("mps")

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

prompt = "a photo of an astronaut riding a horse on mars"

# First-time "warmup" pass
_ = pipe(prompt, num_inference_steps=1)

image = pipe(prompt).images[0]
image.save("astronaut.png")

Continuous Integration

Our automated tests now include a full battery of tests on the mps device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.

See more details in the documentation.

💃 Dance Diffusion

diffusers goes audio 🎵 Dance Diffusion by Harmonai is the first audio model in 🧨Diffusers!

Try it out to generate some random music:

from diffusers import DiffusionPipeline
import scipy

model_id = "harmonai/maestro-150k"
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline = pipeline.to("cuda")

audio = pipeline(audio_length_in_s=4.0).audios[0]

# To save locally
scipy.io.wavfile.write("maestro_test.wav", pipe.unet.sample_rate, audio.transpose())

🎉 Euler schedulers

These are the Euler schedulers, from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). The diffusers implementation is based on the original k-diffusion implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

euler_scheduler = EulerDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler

euler_ancestral_scheduler = EulerAncestralDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]

🔥 Up to 2x faster inference with memory_efficient_attention

Even faster and memory efficient stable diffusion using the efficient flash attention implementation from xformers

  • Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR #532

To leverage it just make sure you have:

  • PyTorch > 1.12
  • Cuda available
  • Installed the xformers library
from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

with torch.inference_mode():
    sample = pipe("a small cat")

# optional: You can disable it via
# pipe.disable_xformers_memory_efficient_attention()

🚀 Much faster loading

Thanks to accelerate, pipeline loading is much, much faster. There are two parts to it:

  • First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using low_cpu_mem_usage (enabled by default), no initialization will be performed.
  • Optionally, you can also use device_map="auto" to automatically select the best device(s) where the pre-trained weights will be initially sent to.

In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.

As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.

This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.

🎨 RePaint

RePaint allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models by Andreas Lugmayr et al.

from diffusers import RePaintPipeline, RePaintScheduler

# Load the RePaint scheduler and pipeline based on a pretrained DDPM model
scheduler = RePaintScheduler.from_config("google/ddpm-ema-celebahq-256")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
pipe = pipe.to("cuda")

generator = torch.Generator(device="cuda").manual_seed(0)
output = pipe(
    original_image=original_image,
    mask_image=mask_image,
    num_inference_steps=250,
    eta=0.0,
    jump_length=10,
    jump_n_sample=10,
    generator=generator,
)
inpainted_image = output.images[0]

image

🌍 Community Pipelines

Long Prompt Weighting Stable Diffusion

The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class.
For a code example, see Long Prompt Weighting Stable Diffusion

  • [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907

Speech to Image

Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion.
For a code example, see Speech to Image

Wildcard Stable Diffusion

A minimal implementation that allows for users to add "wildcards", denoted by __wildcard__ to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file.
For a code example, see Wildcard Stable Diffusion

Composable Stable Diffusion

Use logic operators to do compositional generation.
For a code example, see Composable Stable Diffusion

  • Add Composable diffusion to community pipeline examples by @MarkRich in #951

Imagic Stable Diffusion

Image editing with Stable Diffusion.
For a code example, see Imagic Stable Diffusion

Seed Resizing

Allows to generate a larger image while keeping the content of the original image.
For a code example, see Seed Resizing

📝 Changelog

  • [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907
  • [Stable Diffusion] Add components function by @patrickvonplaten in #889
  • [PNDM Scheduler] Make sure list cannot grow forever by @patrickvonplaten in #882
  • [DiffusionPipeline.from_pretrained] add warning when passing unused k… by @patrickvonplaten in #870
  • DOC Dreambooth Add --sample_batch_size=1 to the 8 GB dreambooth example script by @leszekhanusz in #829
  • [Examples] add speech to image pipeline example by @MikailINTech in #897
  • [dreambooth] dont use safety check when generating prior images by @patil-suraj in #922
  • Dreambooth class image generation: ...
Read more

v0.6.0: Finetuned Stable Diffusion inpainting

19 Oct 15:52
Compare
Choose a tag to compare

🎨 Finetuned Stable Diffusion inpainting

The first official stable diffusion checkpoint fine-tuned on inpainting has been released.

You can try it out in the official demo here

or code it up yourself 💻 :

from io import BytesIO

import torch

import PIL
import requests
from diffusers import StableDiffusionInpaintPipeline


def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")


img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    revision="fp16",
    torch_dtype=torch.float16,
)
pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"

output = pipe(prompt=prompt, image=image, mask_image=mask_image)
image = output.images[0]

gives:

image mask_image prompt Output
drawing drawing Face of a yellow cat, high resolution, sitting on a park bench => drawing

⚠️ This release deprecates the unsupervised noising-based inpainting pipeline into StableDiffusionInpaintPipelineLegacy.
The new StableDiffusionInpaintPipeline is based on a Stable Diffusion model finetuned for the inpainting task: https://huggingface.co/runwayml/stable-diffusion-inpainting

Note
When loading StableDiffusionInpaintPipeline with a non-finetuned model (i.e. the one saved with diffusers<=0.5.1), the pipeline will default to StableDiffusionInpaintPipelineLegacy, to maintain backward compatibility ✨

from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

assert pipe.__class__ .__name__ == "StableDiffusionInpaintPipelineLegacy"

Context:

Why this change? When Stable Diffusion came out ~2 months ago, there were many unofficial in-painting demos using the original v1-4 checkpoint ("CompVis/stable-diffusion-v1-4"). These demos worked reasonably well, so that we integrated an experimental StableDiffusionInpaintPipeline class into diffusers. Now that the official inpainting checkpoint was released: https://github.com/runwayml/stable-diffusion we decided to make this our official pipeline and move the old / hacky one to "StableDiffusionInpaintPipelineLegacy".

🚀 ONNX pipelines for image2image and inpainting

Thanks to the contribution by @zledas (#552) this release supports OnnxStableDiffusionImg2ImgPipeline and OnnxStableDiffusionInpaintPipeline optimized for CPU inference:

from diffusers import OnnxStableDiffusionImg2ImgPipeline, OnnxStableDiffusionInpaintPipeline

img_pipeline = OnnxStableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", revision="onnx", provider="CPUExecutionProvider"
)

inpaint_pipeline = OnnxStableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", revision="onnx", provider="CPUExecutionProvider"
)

🌍 Community Pipelines

Two new community pipelines have been added to diffusers 🔥

Stable Diffusion Interpolation example

Interpolate the latent space of Stable Diffusion between different prompts/seeds.
For more info see stable-diffusion-videos.

For a code example, see Stable Diffusion Interpolation

  • Add Stable Diffusion Interpolation Example by @nateraw in #862

Stable Diffusion Interpolation Mega

One Stable Diffusion Pipeline with all functionalities of Text2Image, Image2Image and Inpainting

For a code example, see Stable Diffusion Mega

📝 Changelog

v0.5.1: Patch release

13 Oct 19:24
Compare
Choose a tag to compare

This patch release fixes an bug with Flax's NFSW safety checker in the pipeline.

#832 by @patil-suraj

v0.5.0: JAX/Flax and TPU support

13 Oct 17:54
0679d09
Compare
Choose a tag to compare

🌾 JAX/Flax integration for super fast Stable Diffusion on TPUs.

We added JAX support for Stable Diffusion! You can now run Stable Diffusion on Colab TPUs (and GPUs too!) for faster inference.

Check out this TPU-ready colab for a Stable Diffusion pipeline: Open In Colab
And a detailed blog post on Stable Diffusion and parallelism in JAX / Flax 🤗 https://huggingface.co/blog/stable_diffusion_jax

The most used models, schedulers and pipelines have been ported to JAX/Flax, namely:

  • Models: FlaxAutoencoderKL, FlaxUNet2DConditionModel
  • Schedulers: FlaxDDIMScheduler, FlaxDDIMScheduler, FlaxPNDMScheduler
  • Pipelines: FlaxStableDiffusionPipeline

Changelog:

🔥 DeepSpeed low-memory training

Thanks to the 🤗 accelerate integration with DeepSpeed, a few of our training examples became even more optimized in terms of VRAM and speed:

✏️ Changelog

v0.4.2: Patch release

11 Oct 22:48
Compare
Choose a tag to compare

This patch release allows the img2img pipeline to be run on fp16 and fixes a bug with the "mps" device.

v0.4.1: Patch release

07 Oct 09:01
Compare
Choose a tag to compare

This patch release fixes an bug with incorrect module naming for community pipelines and an incorrect breaking change when moving piplines in fp16 to "cpu" or "mps".

v0.4.0 Better, faster, stronger!

06 Oct 16:37
Compare
Choose a tag to compare

🚗 Faster

We have thoroughly profiled our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost 3x.

On top of that, we now default to using the float16 format. It's much faster than float32 and, according to our tests, produces images with no discernible difference in quality. This beats the use of autocast, so the resulting code is cleaner!

🔑 use_auth_token no more

The recently released version of huggingface-hub automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using huggingface-cli login in your terminal and you're all set.

- pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
+ pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

We bumped huggingface-hub version to 0.10.0 in our dependencies to achieve this.

🎈More flexible APIs

  • Schedulers now use a common, simpler unified API design. This has allowed us to remove many conditionals and special cases in the rest of the code, including the pipelines. This is very important for us and for the users of 🧨 diffusers: we all gain clarity and a solid abstraction for schedulers. See the description in #719 for more details

Please update any custom Stable Diffusion pipelines accordingly:

- if isinstance(self.scheduler, LMSDiscreteScheduler):
-    latents = latents * self.scheduler.sigmas[0]
+ latents = latents * self.scheduler.init_noise_sigma
- if isinstance(self.scheduler, LMSDiscreteScheduler):
-     sigma = self.scheduler.sigmas[i]
-     latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
- if isinstance(self.scheduler, LMSDiscreteScheduler):
-     latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample
- else:
-     latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
  • Pipeline callbacks. As a community project (h/t @jamestiotio!), diffusers pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.

🛠️ More tasks

Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:

  • Textual inversion. Makes it possible to quickly train a new concept or style and incorporate it into the vocabulary of Stable Diffusion. Hundreds of people have already created theirs, and they can be shared and combined together. See the training Colab to get started.
  • Dreambooth. Similar goal to textual inversion, but instead of creating a new item in the vocabulary it fine-tunes the model to make it learn a new concept. Training Colab.
  • Negative prompts. Another community effort led by @shirayu. The Stable Diffusion pipeline can now receive both a positive prompt (the one you want to create), and a negative prompt (something you want to drive the model away from). This opens up a lot of creative possibilities!

🏃‍♀️ Under the hood changes to support better fine-tuning

Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for diffusers to support general-purpose fine-tuning (coming soon!).

⚠️ Experimental: community pipelines

This is big, but it's still an experimental feature that may change in the future.

We are constantly amazed at the amount of imagination and creativity in the diffusers community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in 🤗 Hub, GitHub or your local filesystem and StableDiffusionPipeline.from_pretrained will be able to load and run it. Read more in the documentation.

We can't wait to see what new tasks the community creates!

💪 Quality of life fixes

Bug fixing, improved documentation, better tests are all important to ensure diffusers is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!

🙏 Significant community contributions

The following people have made significant contributions to the library over the last release:

  • @Victarry – Add training example for DreamBooth (#554)
  • @jamestiotio – Add callback parameters for Stable Diffusion pipelines (#521)
  • @jachiam – Allow resolutions that are not multiples of 64 (#505)
  • @johnowhitaker – Adding pred_original_sample to SchedulerOutput for some samplers (#614).
  • @keturn – Interesting discussions and insights on many topics.

✏️ Change list

Read more