what's the purpose of bool_matrix1024, bool_matrix4096 in the return of cal_attn_mask_xl function #97

parryppp · 2024-05-16T09:31:58Z

The image of bool_matrix1024, bool_matrix4096 are shown as belowed.

Z-YuPeng · 2024-05-16T09:38:42Z

Our method is based on sampling tokens to interact between images through attention operations. To perform the sampling operation, the mask identifies the tokens to be sampled. In order to reduce memory consumption, we switch from using the mask to using indices.

parryppp · 2024-05-16T12:06:51Z

The reshaped attention mask is shown above. Do you mean that, for example, if i want to generate 4 consistent images, the yellow zone in the attention map would not be masked, then what does 'randsample' in the paper mean?"

Z-YuPeng · 2024-05-16T13:38:24Z

The generated random mask is exactly the means of implementing random sampling. Once we have randomized a mask, it means that only the tokens indicated by mask = 1 will be considered.

Z-YuPeng · 2024-05-16T13:40:35Z

the yellow zone corresponding to the concatenation operation below, we found that we cannot drop a image's own tokens, as this would lead to a significant decline in image quality.

Z-YuPeng · 2024-05-16T13:46:57Z

The mask operation combines random sampling and concatenation into a single step because we initially found that doing so was faster and equivalent to random sampling but also led to greater memory usage. Later, we reverted to the original approach due to concerns about memory consumption raised in issues.
https://github.com/HVision-NKU/StoryDiffusion/blob/main/utils/gradio_utils.py#L258

parryppp · 2024-05-17T04:54:08Z

Thank you for your explanation, I now understand much more clearly. But I still have a question about the shape of attention mask. why does the attention mask ensure that squares on the diagonal remain set to 1 as shown in the figure belowed, is it the purpose of code in L249-L252？

StoryDiffusion/utils/gradio_utils.py

Line 249 in b67d205

bool_matrix1024[i:i+1,id_length*nums_1024:] = False

why not just simply generate a random attention mask just like the belowed figure?

StoryDiffusion/utils/gradio_utils.py

Line 261 in b67d205

    
           bool_matrix1024 = torch.rand((total_length,nums_1024),device = device,dtype = dtype) < sa32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what's the purpose of bool_matrix1024, bool_matrix4096 in the return of cal_attn_mask_xl function #97

what's the purpose of bool_matrix1024, bool_matrix4096 in the return of cal_attn_mask_xl function #97

parryppp commented May 16, 2024

Z-YuPeng commented May 16, 2024

parryppp commented May 16, 2024

Z-YuPeng commented May 16, 2024

Z-YuPeng commented May 16, 2024

Z-YuPeng commented May 16, 2024

parryppp commented May 17, 2024

what's the purpose of bool_matrix1024, bool_matrix4096 in the return of cal_attn_mask_xl function #97

what's the purpose of bool_matrix1024, bool_matrix4096 in the return of cal_attn_mask_xl function #97

Comments

parryppp commented May 16, 2024

Z-YuPeng commented May 16, 2024

parryppp commented May 16, 2024

Z-YuPeng commented May 16, 2024

Z-YuPeng commented May 16, 2024

Z-YuPeng commented May 16, 2024

parryppp commented May 17, 2024