Real time style transfer with MGAN

The main problem of the MGAN method is the processing of flat surfaces and the ghosting effect produced around faces. To fix this problem, we tried the following :

Adding brown noise on input images during training
Adding a total variation (TV) loss module
Concatenating additional noise feature maps to the output of VGG (ReLU4_1)
Changing the L2 total variation loss to an L1 loss
- TV weight > 1e-4 prevents the network from initially learning,
- TV weight < 1e-4 does work but slows down the learning process
Pretraining a generator network without setting the tv loss and then setting the L1 loss on:
- TV weight >= 1e-4 produces monochrome images
- TV weight = 5e-5 tends to blend colors drastically
- TV weight ~ 2e-5 tends to slightly smooth images during training but the images produced during testing are not as good as the ones produced during testing
Removing the padding of VGG by taking crops of the feature maps (ReLU4_1)
- We take the centered crop (64x512x16x16) of feature map tensor of size 64x512x25x25. The generator upscales 3 times which gives us 128x128 patches, the training visualizations look better but the generated images do not look as good as the default method
Using Dmitry Ulyanov loss function (it uses VGG to compute a content and a texture loss) : This was not thoroughly explored
- learning rate > 1e-3 prevents the generator network from learning
- learning rate = 1e-3, after 6 epochs of 800 iterations with a batch size of 32 the network reproduces shapes but all in the same color and with funny artefacts

References

Dmitri Ulyanov Li and Wand

Original MGANs readme from Li/Wand

Training & Testing code (torch), pre-trained models and supplementary materials for "Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks".

See this video for a quick explaination for our method and results.

Setup

This code is based on Torch. It has only been tested on Mac and Ubuntu.

Dependencies:

Torch

For CUDA backend:

CUDA
cudnn

Training

Simply cd into folder "code/" and run the training script.

th train.lua

The current script is an example of training a network from 100 ImageNet photos and a single painting from Van Gogh. The input data are organized in the following way:

"Dataset/VG_Alpilles_ImageNet100/ContentInitial": 5 training ImageNet photos to initialize the discriminator.
"Dataset/VG_Alpilles_ImageNet100/ContentTrain": 100 training ImageNet photos.
"Dataset/VG_Alpilles_ImageNet100/ContentTest": 10 testing ImageNet photos (for later inspection).
"Dataset/VG_Alpilles_ImageNet100/Style": Van Gogh's painting.

The training process has three main steps:

Use MDAN to generate training images (MDAN_wrapper.lua).
Data Augmentation (AG_wrapper.lua).
Train MGAN (MDAN_wrapper.lua).

Testing

The testing process has two steps:

Step 1: call "th release_MGAN.lua" to concatenate the VGG encoder with the generator.
Step 2: call "th demo_MGAN.lua" to test the network with new photos.

Display

You can use the browser based display package to display the training process for both MDANs and MGANs.

Install: luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
Call: th -ldisplay.start
See results at this URL: http://localhost:8000

Example

We chose Van Gogh's "Olive Trees with the Alpilles in the Background" as the reference texture.

We then transfer 100 ImageNet photos into the same style with the proposed MDANs method. MDANs take an iterative deconvolutional approach, which is similar to "A Neural Algorithm of Artistic Style" by Leon A. Gatys et al. and our previous work "CNNMRF". Differently, it uses adversarial training instead of gaussian statistics ("A Neural Algorithm of Artistic Style) or nearest neighbour search "CNNMRF". Here are some transferred results from MDANs:

The results look nice, so we know adversarial training is able to produce results that are comparable to previous methods. In other experiments we observed that gaussian statistics work remarkable well for painterly textures, but can sometimes be too flexible for photorealistic textures; nearest-neighbor search preserve photorealistic details but can be too rigid for deformable textures. In some sense MDANs offers a relatively more balanced choice with advaserial training. See our paper for more discussoins.

Like previous deconvolutional methods, MDANs is VERY slow. A Nvidia Titan X takes about one minute to transfer a photo of 384 squared. To make it faster, we replace the deconvolutional process by a feed-forward network (MGANs). The feed-forward network takes long time to train (45 minutes for this example on a Titan X), but offers significant speed up in testing time. Here are some results from MGANs:

It is our expectation that MGANs will trade quality for speed. The question is: how much? Here are some comparisons between the result of MDANs and MGANs:

In general MDANs (middle) give more stylished results, and does a much better job at homegenous background areas (the last two cases). But sometimes MGANs (right) is able to produce comparable results (the first two).

And MGANs run at least two orders of magnitudes faster.

Final remark

There are concurrent works that try to make deep texture synthesis faster. For example, Ulyanov et al. and Johnson et al. also achieved significant speed up and very nice results with a feed-forward architecture. Both of these two methods used the gaussian statsitsics constraint proposed by Gatys et al.. We believe our method is a good complementary: by changing the gaussian statistics constraint to discrimnative networks trained with Markovian patches, it is possible to model more complex texture manifolds (see discussion in our paper).

Last, here are some prelimiary results of training a MGANs for photorealistic synthesis. It learns from 200k face images from CelebA. The network then transfers VGG_19 encoding (layer ReLU5_1) of new face images (left) into something interesting (right). The synthesized faces have the same poses/layouts as the input faces, but look like different persons :-)

Acknowledgement

We thank Soumith Chintala for sharing his implementation of Deep Convolutional Generative Adversarial Networks.
We thank the CelebA team and the ImageNet team for sharing their dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Dataset		Dataset
code		code
model		model
pictures		pictures
supplementary		supplementary
.gitignore		.gitignore
License		License
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset

Dataset

code

code

model

model

pictures

pictures

supplementary

supplementary

.gitignore

.gitignore

License

License

README.md

README.md

Repository files navigation

Real time style transfer with MGAN

References

Original MGANs readme from Li/Wand

Setup

Training

Testing

Display

Example

Final remark

Acknowledgement

About

Releases

Packages

Languages

License

HichameMoriceau/MGANs

Folders and files

Latest commit

History

Repository files navigation

Real time style transfer with MGAN

References

Original MGANs readme from Li/Wand

Setup

Training

Testing

Display

Example

Final remark

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages