-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CombinedPipeline fails to accept bfloat16 image tensor as input #7598
Comments
Since |
I found this issue in pytorch. It's been around since 2022 and the OP states, "Numpy doesn't support bfloat16, and doesn't plan to do so." So it sounds like a 'can't get there from here' situation. Anyway, my actual goal is doing img2img with StableCascade. Any suggestion on how to do it without having to resort to float32 for the model and image would be much appreciated. |
Could you try this one? |
Thanks. I tried input_tensor = torch.from_numpy(np.array(image)).permute(2, 0, 1).unsqueeze(0)
# Convert the tensor to 'bfloat16' dtype
input_tensor = input_tensor.to(torch.bfloat16).float().numpy().astype(ml_dtypes.bfloat16) It compiled and loaded ok, but I get a runtime AttributeError trying to send the image to
I previously had |
I imagined this modification in + import ml_dtypes
framework_to_numpy = {
- "pt": lambda obj: obj.detach().cpu().numpy(),
+ "pt": lambda obj: obj.detach().cpu().float().numpy().astype(ml_dtypes.bfloat16), |
Ahh, sorry for misunderstanding. Apologies for my next dumb question: |
I guess we can add a |
Thanks! Using a cheaper GPU is appealing, I'll try that after I solve the A10 conundrum :-) Meanwhile, forking the transformers repo and making your suggested changes got me farther down the road. It's no longer choking in
I guess I understand even less than I thought. My code has # Ensure model and scheduler are initialized in GPU-enabled function
if torch.cuda.is_available():
pipe = StableCascadeCombinedPipeline.from_pretrained(repo, torch_dtype=torch.bfloat16)
pipe.to("cuda") Does that not convert the weights to CUDA format? Or do I need to somehow get the image to be seen as CPUBFloat16Type? BTW, If I pass None for the image, the code runs correctly and produces an image based strictly on the prompt. So it's clearly doing the right thing with the weights. |
It seems that pipe = StableCascadeCombinedPipeline.from_pretrained(repo, torch_dtype=torch.bfloat16)
pipe.to('cuda')
+ pipe.prior_image_encoder.to('cuda') |
That did it. Thanks! I really appreciate the time and effort you put into resolving this. It also enables sending a PIL image as input without transforming it. |
I've just verified that the fix eliminates the need for a hacked version of transformers. Will post a cleaned up and simplified version of the code so others can use it as a starting point. |
As promised here's a minimal app.py that's been tested in HF Spacess on an A10G large: import gradio as gr
import spaces
from diffusers import StableCascadeCombinedPipeline
import os
import torch
from PIL import Image
import random
# Constants
repo = "stabilityai/stable-cascade"
# Ensure model and scheduler are initialized in GPU-enabled function
if torch.cuda.is_available():
pipe = StableCascadeCombinedPipeline.from_pretrained(repo, torch_dtype=torch.bfloat16)
pipe.to("cuda")
# As of 2024-04-08, pipe.to("cuda") does not move prior image encoder to GPU,
# so we need to do it manually.
# See https://github.com/huggingface/diffusers/issues/7598
pipe.prior_image_encoder.to('cuda')
# The generate function
@spaces.GPU(enable_queue=True)
def generate_image(prompt, image):
seed = random.randint(-100000,100000)
results = pipe(
prompt=prompt,
images=[image] if image is not None else None,
# default output size for SC is 1024x1024
height=1024,
width=1024,
num_inference_steps=20, # 20 steps takes ~17 seconds on an A10G GPU
generator=torch.Generator(device="cuda").manual_seed(seed)
)
return results.images[0]
# ------------- Gradio Interface -----------------------
description = """
A minimal demo using Stable Cascade combined pipeline for image-to-image generation.
A more useful version would provide UI components for controlling the input parameters
and save the settings with the generated images.
"""
with gr.Blocks(css="style.css") as demo:
gr.HTML("<h1><center>Stable Cascade Img2Img ⚡</center></h1>")
gr.Markdown(description)
with gr.Group():
with gr.Row():
prompt = gr.Textbox(label='Enter your prompt', scale=8, value="holding a puppy")
submit = gr.Button(scale=1, variant='primary')
imgin = gr.Image(label='Input Image', type='pil', height=1024, width=1024, interactive=True)
imgout = gr.Image(label='Generated Image', height=1024, width=1024)
prompt.submit(fn=generate_image,
inputs=[prompt, imgin],
outputs=imgout,
)
submit.click(fn=generate_image,
inputs=[prompt, imgin],
outputs=imgout,
)
demo.queue().launch() The requirements.txt file for this app is:
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
hey @Michael-F-Ellis
@standardAI thanks for finding the cause of the error! |
StableCascadeCombinedPipeLine
accepts animages
argument that can be a PIL image, a torch tensor, or a list of either. Unfortunately, I can't get it to accept a bfloat16 type for the image. It raises a runtime error in CLIP. I tried float32, but HF's A10G Large runs out of memory.Here's the error I'm seeing when I try to pass an image encoded as
torch.bfloat16
FWIW, here are relevant snippets from the code that produced the above error:
Originally posted by @Michael-F-Ellis in #7571 (comment)
The text was updated successfully, but these errors were encountered: