GitHub - caopulan/GANInverter: A GAN inversion toolbox based on PyTorch library. We design a unified pipeline for inversion methods and conduct a comprehensive benchmark.

GAN Inverter is a GAN inversion toolbox based on PyTorch library.

We collect SOTA inversion methods and construct a uniform pipeline with more features. Different methods and training strategies are convenient to compose and add. We hope that this toolbox could help people in every use.

Features

Implementations of sota inversion methods.
Unified training/evaluation/inference/editing process.
Modular and flexible configuration. Easy to set options by config file (yaml) or command in every use.
Additional training features.
- Distributed training.
- Weight & bias (wandb).
- Automatically resume training.

Recent Updates

We are working for supporting more methods' inference and conducting the benchmark.

2023.02: SAM is supported.

2023.02: V1.1. Re-organized codes: methods' class, inference pipeline. Add our new work DHR "What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion".

2022.11: Add more optimizers and PTI is supported now.

2022.10: GAN Inverter v1.0 is released. Support methods: pSp, e4e, LSAP.

2022.09: LSAP is published on arxiv.

Model Zoo

Although previous works use "encoder-based", "optimization-based" and "hybrid method" to categorize inversion methods, this set of division criteria is no longer appropriate at present. According to the purpose of methods, we divide the inversion process into two steps: Image Embedding and Result Refinement:

Image Embedding aims to embed images into latent code by encoder or optimization.
Result Refinement aims to refine the initial inversion and editing results from the first step by various strategies (e.g., adjusting generator weight or intermediate feature).

1. Image Embedding Methods

	Method	Type	Repo	Paper	Source
☑️	pSp	E	code	paper	CVPR2021
☑️	e4e	E	code	paper	SIGGRAPH2021
☑️	LSAP	E	code	paper	Arxiv2022
◻️	E2Style	E	code	paper	TIP2022
◻️	Style Transformer	E	code	paper	CVPR2022
☑️	StyleGAN2(LPIPS)	O	code	paper	CVPR2020

Note: E/O/H means encoder-based and optimization-based methods.

2. Result Refinement Methods

	Method	Repo	Paper	Source
◻️	HyperStyle	code	paper	CVPR2022
◻️	HFGI	code	paper	CVPR2022
☑️	SAM	code	paper	CVPR2022
☑️	PTI	code	paper	TOG2022
◻️	FeatureStyleEncoder	code	paper	ECCV2022
☑️	Domain-Specific Hybrid Refinement (DHR)	code	paper	Arxiv2023

3. Editing Methods

	Method	Repo	Paper	Source
☑️	InterFaceGAN	code	paper	CVPR2020
☑️	GANSpace	code	paper	NeurIPS2020
◻️	StyleClip	code	paper	ICCV2021

Benchmark

Evaluating...... results will be reported soon

As evaluation settings are different in previous inversion works, we conduct a benchmark to better evaluate inversion methods based on our unified pipeline. See Evaluation for more details.

Evaluation Settings:

Dataset: CelebA-HQ test split (2,824) images;
ID: identity similarity measured by face recognition model;
LPIPS version: VGG;
Images are generated and converted to uint8 to evaluate, except for ID and FID, which are evaluated on saved images (png format).
See scripts/test.py for more details.

Refinement	Embedding	PSNR $\uparrow$	MSE $\downarrow$	LPIPS $\downarrow$	ID $\uparrow$	FID $\downarrow$
-	Optimization-W	-	-	-	-	-
-	Optimization-W+	-	-	-	-	-
-	pSp	18.0348	0.0345	0.1591	-	-
-	e4e	16.6616	0.0472	0.1974	-	-
-	LSAP	17.4958	0.0391	0.1765	-	-
PTI	pivot_w	24.6004	0.0082	0.0820	-	-
PTI	e4e	-	-	-	-	-
HFGI	e4e	-	-	-	-	-
SAM	e4e	-	-	-	-	-
SAM	LSAP	-	-	-	-	-
DHR	e4e	-	-	-	-	-
DHR	LSAP	-	-	-	-	-

Note: The results may be inconsistent with the reported results in our paper because of different implementations.

Unified Pipeline

Configs

We conduct a unified config system in train/inference/test/edit. All options are saved in the config file, which can be conveniently determined for any use.

We define all options in options. And options/base_options.py contains communal options in every phase.

Two-Stage Inference Pipeline

We follow two-stage inference in this repository. The base inference class TwoStageInference is defined in ./inference/two_stage_inference.py. It follows image embedding -> result refinement pipeline.

This uniform inversion process can easily combine two methods. Users can try any combination of methods, not limited to those employed by the original authors. For example, GANInverter makes it possible to connect ReStyle with HyperStyle by --embed_mode restyle --refine_mode hyperstyle or PTI + e4e by --embed_mode e4e --refine_mode pti.

💥 You can run any method combination by setting their config files now. See Inference for more details.

For example:

e4e: --configs configs/e4e/e4e_ffhq_r50.yaml
PTI + e4e: --configs configs/e4e/e4e_ffhq_r50.yaml configs/pti/pti.yaml
DHR + saved latent codes: --embed_mode code --refine_mode dhr --code_path /path/to/code/xxxx.pt

Supported methods:

Option	Methods	Note
`--embed_mode encoder`	pSp, e4e, LSAP
`--embed_mode optim`	Optimization
`--embed_mode code`	Saved codes	Need to set `--code_path`
`--refine_mode pti`	PTI	Using `--embed_mode optim` to attain $w_{pivot}$
`--refine_mode dhr`	DHR

Model checkpoint

You can load a checkpoint by two ways:

--checkpoint_path xxxx.pt: manually set checkpoint path to load. Although model architecture is slightly different from previous repository (e.g., pSp, e4e), the weight will be automatically converted to fit our architecture. You can use their original weight file.
--auto_resume True: automatically load {exp_dir}/checkpoints/last.pt in training phase or {exp_dir}/checkpoints/best_model.pt in the other phase.

Dataset

--train_dataset_path, --batch_size, --workers are used only in training.
--test_dataset_path, --test_batch_size, --test_workers are default test/inference options in every use.

Installation

Please refer to Installation Instructions for the details of installation.

Getting Start

Preparing Data and Generator

Please refer to Dataset Instructions for the details of datasets.

Training

1. Train encoder

Example: Train LSAP on FFHQ

(1) Single-card training

python scripts/train.py -c configs/lsap/lsap_ffhq_r50.yaml

(2) Distributed training on 8 Cards

python -m torch.distributed.launch --nproc_per_node=8 --master_port=12345 -c configs/lsap/lsap_ffhq_r50.yaml --gpu_num 8

Notes:

set --auto_resume True for automatically resume.
Batch size means total size of all gpus. It must be a multiple of gpu num.
In our experiments, distributed training with batch size of 8 may much slower or accelerate marginally. For example, one iteration of e4e cost 50 sec on both one or two cards. However, distributed training can amplify the total batch size (batch size 8 cost 21G gpu memory) and may achieve fast convergence by large batch size and learning rate. If batch size increased to 16 on two cards (8 sample per card), cost per iteration only slightly increase (from 50 to <60 sec). We are glad to receive any suggestions to improve performance of distributed training.

2. [TODO] Train refinement model based on encoder

Example: Train HFGI with LSAP on FFHQ.

Inference

You can infer images by:

python scripts/infer.py -c /path/to/config1 /path/to/config2

or

python scripts/infer.py \
	--embed_mode [embed_mode] \
	--refine_mode [refine_mode] \
	--test_dataset_path [/path/to/dataset]\
	--output_dir [/path/to/output] \
	--save_code [true/false] \
	--checkpoint_path [/path/to/checkpoint]

--save_code: whether to save latent code.
--test_dataset_path: image file or folder.
--output_dir: path to save inversion results. Inverse images will be saved in {output_dir}/inversion/ and latent codes will be saved in {output_dir}/code/. If not set, use {exp_dir}/inference/ by default.
--checkpoint_path: model weight.

Example1: LSAP on CelebA-HQ.

python scripts/infer.py -c configs/lsap/lsap_ffhq_r50.yaml

Example2: Optimization on CelebA-HQ.

python scripts/infer.py -c configs/optim/optim_celeba-hq.yaml

--stylegan_weights: whether to save latent code.

Example3: PTI+e4e

python scripts/infer.py -c configs/e4e/e4e_ffhq_r50.yaml configs/pti/pti.yaml

Editing

We have three embed_mode in editing: encoder, optim, and code.

(1) Edit with encoder or optimization

python scripts/edit.py -c configs/lsap/lsap_ffhq_r50.yaml --edit_mode interfacegan --edit_path editing/interfacegan_directions/age.pt --edit_factor 1.0

(2) Edit with inverse codes

If you have inferred images first and saved the latent codes, you can edit these latent codes without inversion. We recommend "inference->edit" pipeline since editing with various attributes and factors will not cost extra inversion time.

python scripts/edit.py -c configs/optim/optim_celeba-hq.yaml --embed_mode code --test_dataset_path /path/to/latent/codes/ --edit_mode interfacegan --edit_path editing/interfacegan_directions/age.pt --edit_factor 1.0

Evaluation

TODO

Citation

If you use this toolbox for your research, please cite our repo.

@misc{cao2022ganinverter,
  author       = {Pu Cao and Dongxu Liu and Lu Yang and Qing Song},
  title        = {GAN Inverter},
  howpublished = {\url{https://github.com/caopulan/GANInverter}},
  year         = {2022},
}

@article{cao2022lsap,
  title={LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space},
  author={Cao, Pu and Yang, Lu and Liu, Dongxv and Liu, Zhiwei and Wang, Wenguan and Li, Shan and Song, Qing},
  journal={arXiv preprint arXiv:2209.12746},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
configs		configs
criteria		criteria
datasets		datasets
docs		docs
editing		editing
inference		inference
licenses		licenses
models		models
options		options
scripts		scripts
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

caopulan/GANInverter

Folders and files

Latest commit

History

Repository files navigation

Table of Contents