Regional prompting / Attention couple proof of concept #639

Danamir · 2024-04-21T00:48:16Z

Hello,

Using the custom node cgem156-ComfyUI , I was able to have a functional regional prompting workflow for SDXL.

Here is a proof of concept in krita-ai-diffusion, using a new control layer type "Attention", and splitting the prompt in lines and parsing for ZONE starting lines (or BREAK, PROMPT, ATT, and an optional number... TBD) and new Paint layer in Krita we are able to greatly influence the rendering.

Here is a step by step :

Create a new Paint layer in Krita
Draw/fill the region you want to control (any color, any opacity, we use the transparency mask directly)
Add a new "Attention" control layer in krita-ai-diffusion interface
Affect the new Paint layer to the control layer
Repeat for any number of zones
Alter the prompt:
- Add as many lines as zones starting by ZONE then describe the content
- Optionally add an extra ZONE line that will be applied only to the image outside the defined zones
- The first line of the prompt is copied at the beginning of all prompts
- The last line of the prompt is copied at the end of all prompts

An example with a single zone :

photography of
ZONE a dog
ZONE a cat
two animals, a city street in the background

The second ZONE is automatically affected to the cat prompt. The prompts used as attention couple are : "photography of a dog, two animals, a city street in the background" and "photography of a cat, two animals, a city street in the background".

Another example:

To do :

Danamir · 2024-04-21T01:05:51Z

If you want to check my full ComfuUI Regional Prompting that inspired this PR : https://www.reddit.com/r/StableDiffusion/comments/1c7eaza/comfyui_easy_regional_prompting_workflow_3/

Note : The AttentionCouple node use a JS trick where you have to right click then use "add input", so a dumped workflow from krita-ai-diffusion is unusable until you connect the correct mask & cond.

Danamir · 2024-04-21T12:10:18Z

This development be of interest to : #386 #387 #567

[edit] : Mentioned the PR in those to get some potential testers. 😅

Danamir · 2024-04-21T12:33:26Z

An example for #387 :

illmarzini · 2024-04-21T14:33:18Z

sorry if it s too nooby, but how do i implement this? thanks in advance.

…n-couple

Danamir · 2024-04-22T02:55:25Z

Improved the coherency by subtracting the upper masks from each control layer mask. This allow to better respect the order of the control layers.

This only affect images where there is a covering between two control layers. If you put a control layer below another one covering it entirely, as intended it will have an empty mask and be ignored.

It could be an option, but from my experience it works better that way, otherwise smaller zones are frequently ignored in favor of bigger covering ones.

I tried manually computing masks intersections to combine prompts, but the difference in result is marginal, and the computation by nodes would be a nightmare.

Acly · 2024-05-12T23:13:17Z

But using it with "" as first parameter is a sure way to replace {prompt} by nothing.

That was the point. For cases where no regions are used the placeholder makes no sense.

Regarding transparency masks. You can indicate your region directly with a transparency mask. If you use a paint layer, the transparency mask is created automatically on apply (it is required for correct blending). You have to edit the mask to extend or shrink the region.

Transparency masks are also the only way to make the region bigger after generating. If the alpha was applied directly to the generated result you wouldn't be able to "unerase" it. Transparency mask allows that.

…s to_api

…ve prompts clip encoding

Danamir · 2024-05-13T07:22:18Z

That was the point. For cases where no regions are used the placeholder makes no sense.

I see your point, it is correct if we remove the prompt merging from workflow.apply_attention to do it entirely in the RegionTree.to_api method.

I simplified the apply attention method by removing prompts computations and unnecessary negative prompt clip encoding. Then I altered the to api method to merge the region positive prompts with the root positive. This should be more robust.

Danamir · 2024-05-13T07:24:10Z

I had a case where my background region was < 10% and being ignored, even when it had enough of an impact when used previously. I would propose the region cutoff to be set at 5% coverage.

Danamir · 2024-05-13T15:31:44Z

The Conditioning input from apply_attention do not need to be in/out anymore. This simplifies the code a little bit.

Acly · 2024-05-24T23:18:08Z

So I was a bit distracted, but finally finished the changes I wanted to try. Just tested it briefly with this PR merged in. Theres some small conflicts due to the changes in to_api, I mostly applied your changes directly in the regions branch now.

Regions are now independent from group layers:

Creating a group doesn't automatically create a region
Removing a group doesn't remove a linked region
A region is created with the "Add region" button
A region can be linked to / unlinked from (one or more) group layers
A region has to be removed explicitly

To keep region setup streamlined, the "Add region" has some smart behavior:

By default, it creates a region, and links it to the active layer
If the active layer is already linked, it creates a new group + paint layer and links the new region to that

This essentially allows to still do: Add region, Paint mask, Enter prompt, Add region, Paint mask, Enter prompt, etc. without any superflous clicks. But it doesn't hijack groups meant for other purposes, and allows a lot more flexibility.

The second big change is the addition of "destructive apply". This is a simplified workflow meant for live mode:

Regions can also be linked to paint layers directly (no group)
Applying a result in Live replaces the content of the active layer
This allows to maintain a simple 1:1 relationship between layers and groups and iterate quickly
You can take your simple region setup to Generate mode and use it with Attention couple. On apply, all simple region layers will be upgraded to groups (non-destructive apply, new results are inserted as layers)
Live mode will use the non-destructive apply if a region is linked to a group, so you can go back and forth

There's a bunch of corner cases to iron out, the implementation certainly hasn't gotten simpler. I think usage is more predictable though, because the magic is restricted to the explicit actions ("add region" and "apply"), and outside of that everything is very explicit.

# Conflicts: # ai_diffusion/model.py

Danamir · 2024-05-25T01:47:02Z

Great stuff !
I did a rough merge, and it seems to be working pretty well as is.

I'll do some more testing next week.

Acly · 2024-05-29T12:11:33Z

I did a lot of clean up on the regions branch.

Also I created a branch for comfyui-tooling-nodes: https://github.com/Acly/comfyui-tooling-nodes/tree/preview

It includes a (more or less) copy the Attention Couple node. I made some changes:

Input regions are appended to a list before passed to the node. That way exported prompt workflow.json have all the required connections.
Input masks are expected to have image resolution divided by 8. I'm considering changing the entire region mask pipeline to run at 8x lower resolution, it will speed things up. The full res is never used anyway (also in original attention couple).
Fixed the multiple-of-64 issue. Should work now with any multiple-of-8 image resolution.

Simple example workflow: attention_couple_simple_non64.json

If you have time to test & integrate that would be great. I will take a look at upscale now.

Danamir · 2024-05-29T13:14:38Z

Nice, it was such a pain to debug when the workflow.json were not complete because of the previous node. 😅

I did a quick test with a workflow of mine with your implementation of Attention Couple. As long as I divide the masks size by 8 it's working perfectly. I'll see if I keep the 0.0 base mask, it seems to help in some cases to fill the possible gaps in the global mask, but it needs some more testing.

Danamir · 2024-05-29T13:51:20Z

Question : couldn't the mask divide by 8 be done inside the Attention Couple node to simplify the workflow ? If it's a linear relation there is no need to do the computations in the workflow, and this would allow to use the mask editor in the ComfyUI image loader directly.

Acly · 2024-05-29T14:58:41Z

I'll see if I keep the 0.0 base mask

Note that the first element in the regions list is special. It's conditioning is ignored, instead it uses whatever conditioning is passed to KSampler. Same behavior as the base_mask input in the original node.

I don't think passing a 0 mask makes sense unless you explicitly don't want to use the conditioning that is passed to KSampler for attention at all. If that's really what you want to do, then maybe I should change it so first element in the list uses the linked conditioning like the others. That way it would work without a 0-mask as workaround, and be more plausible too. Just not sure if it breaks anything...

couldn't the mask divide by 8 be done inside the Attention Couple node to simplify the workflow ?

Yes. It's equivalent to doing ScaleBy 1/8 before passing it to the Attention Couple node. The idea is that it's better to do that as early as possible: in the context of the plugin, all the mask preparation happening in the client (subtract & fill) can be done in low res, and there is no need to send eg. full 4k masks for each region over the network to a remote Comfy, if it's all downscaled in the end anyway. It's 64x more pixels, that's a lot.

Also I figured that the original node by laksjdjf is better for using directly in Comfy web UI, but now that you mention it, it's probably easy to detect if the mask is full res and optionally do the downscale in the node.

Acly · 2024-05-29T15:46:01Z

Made the changes. Text prompts are passed to regions now. Conditioning passed to KSampler can be the same as the first region (previous behavior) or empty, or can be used to pass ControlNet etc.

Danamir · 2024-05-29T17:09:10Z

I don't think passing a 0 mask makes sense unless you explicitly don't want to use the conditioning that is passed to KSampler for attention at all

That was only when testing my own workflow, where the masks are made programmatically, and sometimes there is a gap of one pixel between two masks. Having a 0.0 mask covering all the regions seems to fill the possible gaps and prevent breakage on some fringe cases. Nothing to do with the krita plugin, where all the masks are correct, I'll get rid of the 0 mask in the plugin and see if there is any problem.

but now that you mention it, it's probably easy to detect if the mask is full res and optionally do the downscale in the node

That would be great, but once again that was me thinking about my own workflow ; as it would allow me to get rid of the multiple-of-64 limitation. 😅

For the plugin it makes perfect sense to send an a small as possible mask directly.

Text prompts are passed to regions now. Conditioning passed to KSampler can be the same as the first region (previous behavior) or empty, or can be used to pass ControlNet etc.

Perfect.

… couple

Danamir · 2024-05-30T02:30:34Z

The code is merged.

I left the masks at full size for now, removed the 0 str base mask, and got rid of the unnecessary extent scale to a multiple of 64.

Acly · 2024-05-30T09:03:50Z

Thanks! Maybe this would be a good point to clean up this PR and merge it into regions branch.

Danamir · 2024-05-30T09:09:32Z

Yes I think it's in a good state for this.

I'll take a look at lint & type checks, ~~but I think there were some type checks errors that were unavoidable. I'll see.~~

[edit] : All done !

Danamir added 3 commits April 21, 2024 01:37

Add attention control layer

40ba2f1

Attention couple and mask manipulation nodes

cb1e6a4

Apply attention control layer in generate

0a46923

Danamir marked this pull request as draft April 21, 2024 00:48

Danamir added 2 commits April 21, 2024 02:59

Tentative lint correction

ab9f0ef

Tentative lint correction

d4ace76

Danamir added 5 commits April 21, 2024 03:11

Tentative typecheck correction

987c579

Tentative typecheck correction

93aeb81

Tentative typecheck correction

cd059ac

Tentative lint correction

7af7e97

Tentative lint correction

bdc9fed

Danamir added 14 commits April 21, 2024 18:50

Add attention control layer

d6f6df6

Attention couple and mask manipulation nodes

0735035

Apply attention control layer in generate

2b40e5b

Tentative lint correction

192354d

Tentative lint correction

a50da50

Tentative typecheck correction

1811800

Tentative typecheck correction

4ac63f5

Tentative typecheck correction

d080e31

Tentative lint correction

49afe86

Tentative lint correction

06f33bb

Merge remote-tracking branch 'danamir/attention-couple' into attentio…

3738b1f

…n-couple

Handle attention call in refine method

950475a

Subtract lower masks from each attention control layer mask

6954aa4

Lint correction

b15c447

Merge branch 'refs/heads/regions' into attention-couple

858f37d

Danamir added 2 commits May 13, 2024 09:15

Merge region positive prompts with root positive prompt in the region…

f305eea

…s to_api

No more prompt merging in apply_attention. Removed unnecessary negati…

06c5ab7

…ve prompts clip encoding

Removed unneeded 'cond' return from apply_attention method

b57ed2a

Removed unneeded 'cond' return from apply_attention method

f953cbc

Merge branch 'refs/heads/regions' into attention-couple

24529db

# Conflicts: # ai_diffusion/model.py

Danamir added 3 commits May 30, 2024 03:44

Merge branch 'refs/heads/regions' into attention-couple

e7982bd

Using ETN_AttentionCouple and ETN_ListAppend nodes to apply attention…

9fd3d7b

… couple

Removed attention extent scale to a multiple of 64,

94c494e

Danamir added 2 commits May 30, 2024 11:35

Lint style corrections

52238f3

Lint style corrections

9de9b6a

Danamir marked this pull request as ready for review May 30, 2024 09:38

Danamir added 3 commits May 30, 2024 13:31

Typecheck corrections

3045dde

Simplified method parameters

fa20ae6

Typecheck corrections

eac1403

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regional prompting / Attention couple proof of concept #639

Regional prompting / Attention couple proof of concept #639

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 21, 2024

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 21, 2024

illmarzini commented Apr 21, 2024

Danamir commented Apr 22, 2024 •

edited

Acly commented May 12, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Acly commented May 24, 2024 •

edited

Danamir commented May 25, 2024

Acly commented May 29, 2024 •

edited

Danamir commented May 29, 2024

Danamir commented May 29, 2024

Acly commented May 29, 2024

Acly commented May 29, 2024

Danamir commented May 29, 2024

Danamir commented May 30, 2024

Acly commented May 30, 2024

Danamir commented May 30, 2024 •

edited

Regional prompting / Attention couple proof of concept #639

Are you sure you want to change the base?

Regional prompting / Attention couple proof of concept #639

Conversation

Danamir commented Apr 21, 2024 • edited

Danamir commented Apr 21, 2024

Danamir commented Apr 21, 2024 • edited

Danamir commented Apr 21, 2024

illmarzini commented Apr 21, 2024

Danamir commented Apr 22, 2024 • edited

Acly commented May 12, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Acly commented May 24, 2024 • edited

Danamir commented May 25, 2024

Acly commented May 29, 2024 • edited

Danamir commented May 29, 2024

Danamir commented May 29, 2024

Acly commented May 29, 2024

Acly commented May 29, 2024

Danamir commented May 29, 2024

Danamir commented May 30, 2024

Acly commented May 30, 2024

Danamir commented May 30, 2024 • edited

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 22, 2024 •

edited

Acly commented May 24, 2024 •

edited

Acly commented May 29, 2024 •

edited

Danamir commented May 30, 2024 •

edited