DM-VTON: Distilled Mobile Real-time Virtual Try-On

[Paper] [Colab Notebook] [Web Demo]

This is the official pytorch implementation of DM-VTON: Distilled Mobile Real-time Virtual Try-On. DM-VTON is designed to be fast, lightweight, while maintaining the quality of the try-on image. It can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory.

📝 Documentation

Installation

This source code has been developed and tested with python==3.10, as well as pytorch=1.13.1 and torchvision==0.14.1. We recommend using the conda package manager for installation.

Clone this repo.

git clone https://github.com/KiseKloset/DM-VTON.git

Install dependencies with conda (we provide script scripts/install.sh).

conda create -n dm-vton python=3.10
conda activate dm-vton
bash scripts/install.sh

Data Preparation

VITON

Because of copyright issues with the original VITON dataset, we use a resized version provided by CP-VTON. We followed the work of Han et al. to filter out duplicates and ensure no data leakage happens (VITON-Clean). You can download VITON-Clean dataset here.

	VITON	VITON-Clean
Training pairs	14221	6824
Testing pairs	2032	416

Dataset folder structure:

├── VTON-Clean
|   ├── VITON_test
|   |   ├── test_pairs.txt
|   |   ├── test_img
│   │   ├── test_color
│   │   ├── test_edge
|   ├── VITON_traindata
|   |   ├── train_pairs.txt
|   |   ├── train_img
│   │   │   ├── [000003_0.jpg | ...]  # Person
│   │   ├── train_color
│   │   │   ├── [000003_1.jpg | ...]  # Garment
│   │   ├── train_edge
│   │   │   ├── [000003_1.jpg | ...]  # Garment mask
│   │   ├── train_label
│   │   │   ├── [000003_0.jpg | ...]  # Parsing map
│   │   ├── train_densepose
│   │   │   ├── [000003_0.npy | ...]  # Densepose
│   │   ├── train_pose
│   │   │   ├── [000003_0.json | ...] # Openpose

Inference

test.py run inference on image folders, then evaluate FID, LPIPS, runtime and save results to runs/TEST_DIR. Check the sample script for running: scripts/test.sh. You can download the pretrained checkpoints here.

Note: to run and save separate results for each pair [person, garment], set batch_size=1.

Training

For each dataset, you need to train a Teacher network first to guide the Student network. DM-VTON uses FS-VTON as the Teacher. Each model is trained through 2 stages: first stage only trains warping module and stage 2 trains the entire model (warping module + generator). Check the sample scripts for training both Teacher network (scripts/train_pb_warp + scripts/train_pb_e2e) and Student network (scripts/train_pf_warp + scripts/train_pf_e2e). We also provide a Colab notebook as a quick tutorial.

Training Settings

A full list of trainning settings can be found in opt/train_opt.py. Below are some important settings.

device: Device (gpu) for performing training (e.g. 0,1,2). DM-VTON needs a GPU to run with cupy.
batch_size: Customize batch_size for each stage to optimize for your hardware.
lr: learning rate
Epochs = niter + niter_decay
- niter: Number of epochs using starting learning rate.
- niter_decay: Number of epochs to linearly decay learning rate to zero.
save_period: Frequency of saving checkpoints after save_period epochs.
resume: Use if you want to continue training from a previous process.
project and name: The results (checkpoints, logs, images, etc.) will be saved in the project/name folder. Note that if the folder already exists, the code will create a new folder (e.g. project/name-1, project/name-2).`

📈 Result

Results on VITON

Methods	FID $\downarrow$	Runtime (ms) $\downarrow$	Memory (MB) $\downarrow$
ACGPN (CVPR20)	33.3	153.6	565.9
PF-AFN (CVPR21)	27.3	35.8	293.3
C-VTON (WACV22)	37.1	66.9	168.6
SDAFN (ECCV22)	30.2	83.4	150.9
FS-VTON (CVPR22)	26.5	37.5	309.3
OURS	28.2	23.3	37.8

😎 Supported Models

We also support some parser-free models that can be used as Teacher and/or Student. The methods all have a 2-stage architecture (warping module and generator). For more details, see here.

Methods	Source	Teacher	Student
PF-AFN	Parser-Free Virtual Try-on via Distilling Appearance Flows	✅	✅
FS-VTON	Style-Based Global Appearance Flow for Virtual Try-On	✅	✅
RMGN	RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on	❌	✅
DM-VTON (Ours)	DM-VTON: Distilled Mobile Real-time Virtual Try-On	✅	✅

ℹ Citation

If our code or paper is helpful to your work, please consider citing:

@inproceedings{nguyen2023dm,
  title        = {DM-VTON: Distilled Mobile Real-time Virtual Try-On},
  author       = {Nguyen-Ngoc, Khoi-Nguyen and Phan-Nguyen, Thanh-Tung and Le, Khanh-Duy and Nguyen, Tam V and Tran, Minh-Triet and Le, Trung-Nghia},
  year         = 2023,
  booktitle    = {IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)},
}

🙏 Acknowledgments

This code is based on PF-AFN.

📄 License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The use of this code is for academic purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
dataloader		dataloader
exp		exp
losses		losses
models		models
opt		opt
pipelines		pipelines
scripts		scripts
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
test.py		test.py
train_pb_e2e.py		train_pb_e2e.py
train_pb_warp.py		train_pb_warp.py
train_pf_e2e.py		train_pf_e2e.py
train_pf_warp.py		train_pf_warp.py
val.py		val.py
visualize.py		visualize.py

License

KiseKloset/DM-VTON

Folders and files

Latest commit

History

Repository files navigation

DM-VTON: Distilled Mobile Real-time Virtual Try-On

📝 Documentation

Installation

Data Preparation

VITON

Inference

Training

Training Settings

📈 Result

Results on VITON

😎 Supported Models

ℹ Citation

🙏 Acknowledgments

📄 License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages