Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regional prompting / Attention couple proof of concept #639

Open
wants to merge 105 commits into
base: regions
Choose a base branch
from

Conversation

Danamir
Copy link
Contributor

@Danamir Danamir commented Apr 21, 2024

Hello,

Using the custom node cgem156-ComfyUI , I was able to have a functional regional prompting workflow for SDXL.

Here is a proof of concept in krita-ai-diffusion, using a new control layer type "Attention", and splitting the prompt in lines and parsing for ZONE starting lines (or BREAK, PROMPT, ATT, and an optional number... TBD) and new Paint layer in Krita we are able to greatly influence the rendering.

Here is a step by step :

  • Create a new Paint layer in Krita
  • Draw/fill the region you want to control (any color, any opacity, we use the transparency mask directly)
  • Add a new "Attention" control layer in krita-ai-diffusion interface
  • Affect the new Paint layer to the control layer
  • Repeat for any number of zones
  • Alter the prompt:
    • Add as many lines as zones starting by ZONE then describe the content
    • Optionally add an extra ZONE line that will be applied only to the image outside the defined zones
    • The first line of the prompt is copied at the beginning of all prompts
    • The last line of the prompt is copied at the end of all prompts

An example with a single zone :

photography of
ZONE a dog
ZONE a cat
two animals, a city street in the background

image
image

The second ZONE is automatically affected to the cat prompt. The prompts used as attention couple are : "photography of a dog, two animals, a city street in the background" and "photography of a cat, two animals, a city street in the background".

Another example:
image
image

To do :

  • Add a control layer mode "Attention"
  • Get the control layer image opacity as mask
  • Split main prompt by attention zones
  • Apply cond and masks to AttentionCouple
  • Use ETN_AttentionCouple node
  • Handle render methods
    • generate
    • refine
    • inpaint
    • refine_region
  • User switch for attention couple / region prompt finder
  • Load LoRA only once
  • Typechecks
  • Lint

@Danamir Danamir marked this pull request as draft April 21, 2024 00:48
@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

If you want to check my full ComfuUI Regional Prompting that inspired this PR : https://www.reddit.com/r/StableDiffusion/comments/1c7eaza/comfyui_easy_regional_prompting_workflow_3/

Note : The AttentionCouple node use a JS trick where you have to right click then use "add input", so a dumped workflow from krita-ai-diffusion is unusable until you connect the correct mask & cond.

@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

This development be of interest to : #386 #387 #567

[edit] : Mentioned the PR in those to get some potential testers. 😅

@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

An example for #387 :

image
image
image

@illmarzini
Copy link

sorry if it s too nooby, but how do i implement this? thanks in advance.

@Danamir
Copy link
Contributor Author

Danamir commented Apr 22, 2024

Improved the coherency by subtracting the upper masks from each control layer mask. This allow to better respect the order of the control layers.

This only affect images where there is a covering between two control layers. If you put a control layer below another one covering it entirely, as intended it will have an empty mask and be ignored.

It could be an option, but from my experience it works better that way, otherwise smaller zones are frequently ignored in favor of bigger covering ones.

I tried manually computing masks intersections to combine prompts, but the difference in result is marginal, and the computation by nodes would be a nightmare.

@Acly
Copy link
Owner

Acly commented May 12, 2024

But using it with "" as first parameter is a sure way to replace {prompt} by nothing.

That was the point. For cases where no regions are used the placeholder makes no sense.

Regarding transparency masks. You can indicate your region directly with a transparency mask. If you use a paint layer, the transparency mask is created automatically on apply (it is required for correct blending). You have to edit the mask to extend or shrink the region.

Transparency masks are also the only way to make the region bigger after generating. If the alpha was applied directly to the generated result you wouldn't be able to "unerase" it. Transparency mask allows that.

@Danamir
Copy link
Contributor Author

Danamir commented May 13, 2024

That was the point. For cases where no regions are used the placeholder makes no sense.

I see your point, it is correct if we remove the prompt merging from workflow.apply_attention to do it entirely in the RegionTree.to_api method.

I simplified the apply attention method by removing prompts computations and unnecessary negative prompt clip encoding. Then I altered the to api method to merge the region positive prompts with the root positive. This should be more robust.

@Danamir
Copy link
Contributor Author

Danamir commented May 13, 2024

I had a case where my background region was < 10% and being ignored, even when it had enough of an impact when used previously. I would propose the region cutoff to be set at 5% coverage.

@Danamir
Copy link
Contributor Author

Danamir commented May 13, 2024

The Conditioning input from apply_attention do not need to be in/out anymore. This simplifies the code a little bit.

@Acly
Copy link
Owner

Acly commented May 24, 2024

So I was a bit distracted, but finally finished the changes I wanted to try. Just tested it briefly with this PR merged in. Theres some small conflicts due to the changes in to_api, I mostly applied your changes directly in the regions branch now.

Regions are now independent from group layers:

  • Creating a group doesn't automatically create a region
  • Removing a group doesn't remove a linked region
  • A region is created with the "Add region" button
  • A region can be linked to / unlinked from (one or more) group layers
  • A region has to be removed explicitly

To keep region setup streamlined, the "Add region" has some smart behavior:

  • By default, it creates a region, and links it to the active layer
  • If the active layer is already linked, it creates a new group + paint layer and links the new region to that

This essentially allows to still do: Add region, Paint mask, Enter prompt, Add region, Paint mask, Enter prompt, etc. without any superflous clicks. But it doesn't hijack groups meant for other purposes, and allows a lot more flexibility.

The second big change is the addition of "destructive apply". This is a simplified workflow meant for live mode:

  • Regions can also be linked to paint layers directly (no group)
  • Applying a result in Live replaces the content of the active layer
  • This allows to maintain a simple 1:1 relationship between layers and groups and iterate quickly
  • You can take your simple region setup to Generate mode and use it with Attention couple. On apply, all simple region layers will be upgraded to groups (non-destructive apply, new results are inserted as layers)
  • Live mode will use the non-destructive apply if a region is linked to a group, so you can go back and forth

There's a bunch of corner cases to iron out, the implementation certainly hasn't gotten simpler. I think usage is more predictable though, because the magic is restricted to the explicit actions ("add region" and "apply"), and outside of that everything is very explicit.

# Conflicts:
#	ai_diffusion/model.py
@Danamir
Copy link
Contributor Author

Danamir commented May 25, 2024

Great stuff !
I did a rough merge, and it seems to be working pretty well as is.

I'll do some more testing next week.

@Acly
Copy link
Owner

Acly commented May 29, 2024

I did a lot of clean up on the regions branch.

Also I created a branch for comfyui-tooling-nodes: https://github.com/Acly/comfyui-tooling-nodes/tree/preview

It includes a (more or less) copy the Attention Couple node. I made some changes:

  • Input regions are appended to a list before passed to the node. That way exported prompt workflow.json have all the required connections.
  • Input masks are expected to have image resolution divided by 8. I'm considering changing the entire region mask pipeline to run at 8x lower resolution, it will speed things up. The full res is never used anyway (also in original attention couple).
  • Fixed the multiple-of-64 issue. Should work now with any multiple-of-8 image resolution.

Simple example workflow: attention_couple_simple_non64.json

If you have time to test & integrate that would be great. I will take a look at upscale now.

@Danamir
Copy link
Contributor Author

Danamir commented May 29, 2024

Nice, it was such a pain to debug when the workflow.json were not complete because of the previous node. 😅

I did a quick test with a workflow of mine with your implementation of Attention Couple. As long as I divide the masks size by 8 it's working perfectly. I'll see if I keep the 0.0 base mask, it seems to help in some cases to fill the possible gaps in the global mask, but it needs some more testing.

@Danamir
Copy link
Contributor Author

Danamir commented May 29, 2024

Question : couldn't the mask divide by 8 be done inside the Attention Couple node to simplify the workflow ? If it's a linear relation there is no need to do the computations in the workflow, and this would allow to use the mask editor in the ComfyUI image loader directly.

@Acly
Copy link
Owner

Acly commented May 29, 2024

I'll see if I keep the 0.0 base mask

Note that the first element in the regions list is special. It's conditioning is ignored, instead it uses whatever conditioning is passed to KSampler. Same behavior as the base_mask input in the original node.

I don't think passing a 0 mask makes sense unless you explicitly don't want to use the conditioning that is passed to KSampler for attention at all. If that's really what you want to do, then maybe I should change it so first element in the list uses the linked conditioning like the others. That way it would work without a 0-mask as workaround, and be more plausible too. Just not sure if it breaks anything...

couldn't the mask divide by 8 be done inside the Attention Couple node to simplify the workflow ?

Yes. It's equivalent to doing ScaleBy 1/8 before passing it to the Attention Couple node. The idea is that it's better to do that as early as possible: in the context of the plugin, all the mask preparation happening in the client (subtract & fill) can be done in low res, and there is no need to send eg. full 4k masks for each region over the network to a remote Comfy, if it's all downscaled in the end anyway. It's 64x more pixels, that's a lot.

Also I figured that the original node by laksjdjf is better for using directly in Comfy web UI, but now that you mention it, it's probably easy to detect if the mask is full res and optionally do the downscale in the node.

@Acly
Copy link
Owner

Acly commented May 29, 2024

Made the changes. Text prompts are passed to regions now. Conditioning passed to KSampler can be the same as the first region (previous behavior) or empty, or can be used to pass ControlNet etc.

image

@Danamir
Copy link
Contributor Author

Danamir commented May 29, 2024

I don't think passing a 0 mask makes sense unless you explicitly don't want to use the conditioning that is passed to KSampler for attention at all

That was only when testing my own workflow, where the masks are made programmatically, and sometimes there is a gap of one pixel between two masks. Having a 0.0 mask covering all the regions seems to fill the possible gaps and prevent breakage on some fringe cases. Nothing to do with the krita plugin, where all the masks are correct, I'll get rid of the 0 mask in the plugin and see if there is any problem.

but now that you mention it, it's probably easy to detect if the mask is full res and optionally do the downscale in the node

That would be great, but once again that was me thinking about my own workflow ; as it would allow me to get rid of the multiple-of-64 limitation. 😅

For the plugin it makes perfect sense to send an a small as possible mask directly.

Text prompts are passed to regions now. Conditioning passed to KSampler can be the same as the first region (previous behavior) or empty, or can be used to pass ControlNet etc.

Perfect.

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

The code is merged.

I left the masks at full size for now, removed the 0 str base mask, and got rid of the unnecessary extent scale to a multiple of 64.

@Acly
Copy link
Owner

Acly commented May 30, 2024

Thanks! Maybe this would be a good point to clean up this PR and merge it into regions branch.

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

Yes I think it's in a good state for this.

I'll take a look at lint & type checks, but I think there were some type checks errors that were unavoidable. I'll see.

[edit] : All done !

@Danamir Danamir marked this pull request as ready for review May 30, 2024 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants