DiffSketching: Sketch Control Image Synthesis with Diffusion Models

In this repository, you can find the official PyTorch implementation of DiffSketching: Sketch Control Image Synthesis with Diffusion Models (BMVC2022). Supplements can be found here.

Authors: Qiang Wang, Di Kong, Fengyin Lin and Yonggang Qi, Beijing University of Posts and Telecommunications, Beijing, China.

Abstract: Creative sketch is a universal way of visual expression, but translating images from an abstract sketch is very challenging. Traditionally, creating a deep learning model for sketch-to-image synthesis needs to overcome the distorted input sketch without visual details, and requires to collect large-scale sketch-image datasets. We first study this task by using diffusion models. Our model matches sketches through the cross domain constraints, and uses a classifier to guide the image synthesis more accurately. Extensive experiments confirmed that our method can not only be faithful to user’s input sketches, but also maintain the diversity and imagination of synthetic image results. Our model can beat GAN-based method in terms of generation quality and human evaluation, and does not rely on massive sketch-image datasets. Additionally, we present applications of our method in image editing and interpolation.

Datasets

ImageNet

Please go to the ImageNet official website to download the complete datasets.

Sketchy

Please go to the Sketchy official website to download the Sketches and Photos datasets.

QuickDraw

Please go to the QuickDraw official website to download the datasets. The original data format is vector, please convert it to .png or .jpg before use it.

Installation

The requirements of this repo can be found in requirements.txt.

pip install -r requirements.txt

Train

Train the feature extraction network, please pay attention to modifying image_root before run.

python scripts/feature_train.py

Train the photosketch network.

python scripts/photosketch_train.py --dataroot [path/to/sketchy-datasets] \
                                    --model pix2pix \
                                    --which_model_netG resnet_9blocks \
                                    --which_model_netD global_np

Train the diffusion network.

python scripts/image_train.py --data_dir [path/to/imagenet-datasets] \
                              --iterations 1000000 \
                              --anneal_lr True \
                              --batch_size 512 \
                              --lr 4e-4 \
                              --save_interval 10000 \
                              --weight_decay 0.05

Train the classifier network.

python scripts/classifier_train.py --data_dir [path/to/imagenet-datasets] \
                                   --iterations 1000000 \
                                   --anneal_lr True \
                                   --batch_size 512 \
                                   --lr 4e-4 \
                                   --save_interval 10000 \
                                   --weight_decay 0.05 \
                                   --image_size 256 \
                                   --classifier_width 256 \
                                   --classifier_pool attention \
                                   --classifier_resblock_updown True \
                                   --classifier_use_scale_shift_norm True

Inference

python scripts/image_sample.py --model_path [/path/to/model] \
                               --image_root [/path/to/reference-image] \
                               --sketch_root [/path/to/reference-sketch] \
                               --save_path [/path/to/save] \
                               --batch_size 4 \
                               --num_samples 50000 \
                               --timestep_respacing ddim25 \
                               --use_ddim True \
                               --class_cond True \
                               --image_size 256 \
                               --learn_sigma True \
                               --use_fp16 True \
                               --use_scale_shift_norm True

Evaluation

Please package the results to be evaluated in .npz format, and provide FID, IS, Precision and Recall test results.

python evaluations/evaluator.py [/path/to/reference-data] [/path/to/generate-data]

Citation

@inproceedings{Wang_2022_BMVC,
author    = {Qiang Wang and Di Kong and Fengyin Lin and Yonggang Qi},
title     = {DiffSketching: Sketch Control Image Synthesis with Diffusion Models},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0067.pdf}
}

License

This project is released under the MIT License.

Acknowledgements

Contact

If you have any questions about the code, please contact wanqqiang@bupt.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
evaluations		evaluations
guided_diffusion		guided_diffusion
images		images
lpips		lpips
photosketch		photosketch
scripts		scripts
util		util
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

wangqiang9/DiffSketching

Folders and files

Latest commit

History

Repository files navigation

DiffSketching: Sketch Control Image Synthesis with Diffusion Models

Datasets

ImageNet

Sketchy

QuickDraw

Installation

Train

Inference

Evaluation

Citation

License

Acknowledgements

Contact

About

Resources

License

Stars

Watchers

Forks

Languages