Skip to content

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Notifications You must be signed in to change notification settings

HelenMao/MAG-Edit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

This repository is the official implementation of MAG-Edit.

Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou

Project Website arXiv


(a) Blended latent diffusion (b) DiffEdit (c) Prompt2Prompt
(d) Plug-and-play (e) P2P+Blend (f) PnP+Blend

🔖 Abstract

TL; DR: MAG-Edit is the first method specifically designed to address localized image editing in complex scenarios without training.

CLICK for the full abstract Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.

📝 Changelog

  • 2024.05.24 Release Token Ratio Code!
  • 2023.12.19 Release Project Page and Paper!

💡TODO:

  • Release Spatial Ratio Code
  • Release Token Ratio Code
  • Release MAG-Edit paper and project page

🎮 MAG-Edit Implementation

Setup Environment

Our method is tested using cuda12.0 on a single A100 or V100. The preparation work mainly includes downloading the pre-trained model and configuring the environment.

conda create -n mag python=3.8
conda activate mag

pip install -r requirements.txt

We use Stable Diffusion v1-4 as backbone, please download from Hugging Face and change the file path in line26 in code_tr/network.py.

Run MAG-Edit (Token Ratio)

To run MAG-Edit, single GPU with at least 32 GB VRAM is required. The code_tr/edit.sh provide the edit sample.

CUDA_VISIBLE_DEVICES=0 python edit.py --source_prompt="there is a set of sofas on the red carpet in the living room"\
                --target_prompt="there is a set of sofas on the yellow carpet in the living room" \
                --target_word="yellow" \
                --img_path="examples/1/1.jpg"\
                --mask_path="examples/1/mask.png"\
                --result_dir="result"\
                --max_iteration=15\
                --scale=2.5

The result is saved at code_tr/result.

Various Editing Types

Other Applications


Qualitative Comparison

Comparison with training-free methods

Simplified
Prompt
Source
Image
Ours Blended LD DiffEdit P2P PnP
Green
pillow
Denim
pants
White
bird
Slices of
steak

Comparison with training and finetuning methods

Simplified
Prompt
Source
Image
Ours Instruct
-Pix2Pix
Magic
-Brush
SINE
Yellow
car
Plaid
Sofa
Tropical
fish
Straw
-berry

Comparison with Inversion methods

Simplified
Prompt
Source
Image
Ours Style
-Diffusion
ProxNPI DirectInversion
Jeep
Floral
sofa
Yellow
shirt

🚩 Citation

@article{mao2023magedit,
      title={MAG-Edit: Localized Image Editing in Complex Scenarios via $\underline{M}$ask-Based $\underline{A}$ttention-Adjusted $\underline{G}$uidance}, 
      author={Qi Mao and Lan Chen and Yuchao Gu and Zhen Fang and Mike Zheng Shou},
      year={2023},
      journal={arXiv preprint arXiv:2312.11396},
}

💞 Acknowledgements

This repository borrows heavily from prompt-to-prompt and layout-guidance. Thanks to the authors for sharing their code and models.

About

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published