Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stable Diffusion] VAE Moments to image outputs whited out image. #721

Open
entrpn opened this issue Mar 21, 2024 · 0 comments
Open

[Stable Diffusion] VAE Moments to image outputs whited out image. #721

entrpn opened this issue Mar 21, 2024 · 0 comments

Comments

@entrpn
Copy link

entrpn commented Mar 21, 2024

Hi @ahmadki I'm trying to reproduce the Stable diffusion training results.

I noticed when I decode the moments back to images using the VAE's decoder, I'm getting whited out images. See:

from_latents

I noticed in the images2latents.py, the images are not normalized:

https://github.com/mlcommons/training/blob/master/stable_diffusion/webdataset_images2latents.py#L86

If I add normalization as follows:

transforms = transforms.Compose(
  [
    transforms.ToTensor(),
    transforms.Resize(size=512, interpolation=transforms.InterpolationMode.BICUBIC),
    transforms.CenterCrop(size=512),
    transforms.Normalize([0.5], [0.5])
  ]
)

The whiteout goes away:

encoded_decoded

Is there a reason why images were not normalized and how does this affect training of the unet?

Code to reproduce using HuggingFace diffusers:

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch
from diffusers.models.autoencoders.vae import DiagonalGaussianDistribution
import numpy as np
from PIL import Image
model_id = "stabilityai/stable-diffusion-2-base"

# Use the Euler scheduler here instead
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

moments = torch.from_numpy(np.load("000009999.npy")).type(torch.float16)
moments = moments.to("cuda")
latents = DiagonalGaussianDistribution(moments).sample()
latents = latents * pipe.vae.config.scaling_factor

latents = 1 / pipe.vae.config.scaling_factor * latents
image = pipe.vae.decode(latents, return_dict=False)[0]
image = (image / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3 ,1).detach().float().numpy()
image = (image * 255).round().astype("uint8")
image = Image.fromarray(image[0])
image.save("test.png")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants