Skip to content

priyam314/NeuralStyleTransfer-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inspecting Neural Style Transfer and Playaround 🎠

In this repository I have implemented original Neural Style Transfer paper "Image Style Transfer Using Convolutional Neural Networks" and inspected how result of transfer of content and style image changes by changing weight constants, learning rates, optimizers etc.

Contents

  1. Introduction
  2. Reconstruct
    1. Noise
    2. Content
    3. Style
    4. Further-Studies
  3. Visualization
    1. Style
    2. Content
    3. Both

Introduction

Style Transfer is the task of composing style from one image which is style image over another image which is content image. Before doing style transfer using neural network the major limiting factor in this task was feature representation of content and style image for better composition. Lack of such representations thwarted the way to understand the semantics and separation between the two. With the success ✔️ of VGG networks on ImageNet Challenge in Object Localization and Object Detection 🔍 , researchers gave the style transfer a neural approach.

Authors used the feature representations from VGG network to learn high and low level features of both content and style images. Using these implicit information they kept minimizing the loss between content representation and generated image representation using MSELoss and between style representation and generated image representation using MSELoss of Gram Matrices. Task of Neural Style Transfer unlike supervised learning doesn't have metric to compare performance of quality of image(s). We are not training model but updating the values of image itself in every iteration using gradient descent such that it match closely with content and style image.

I believe this brief overview of Neural Style Transfer is enough to get us started with experiments and notice some fascinating results.

Note: This is not a blog post on Neural Style Transfer. No exlpanation on the type of model, training etc is provided.

Setting Parameters

For our experiments we will set the parameters to following value until explicitly written.

iterations: 2500
fps: 30
size: 128
sav_freq: 10
alpha: 5.0
beta: 7000.0
gamma: 1.2
style_weights: [1e3/n**2 for n in [16.0,32.0,128.0,256.0,512.0]]
lr: 0.06

if path to content image and style images are not provided then default images will be used that lie inside NeuraltyleTransfer-App/src/data

For detailed understanding about these parameters go through python3 main.py -h

Reconstruct

Neural Style Transfer is like painting an image over a canvas. This canvas is of same size to that of content image since content is static and only dynamic changes that need to be composed over this canvas is of style image. Though size is same to that of content image but there are 3 - 4 ways we can initialize this canvas with, and then using gradient descent 📉 update the values of the canvas.

Following shell command can lead you to generate canvas by blending the style over content image. This is basic bash command for reconstruction of canvas, for more infomation about arguments you can go through python3 main.py --help

python3 main.py --reconstruct --content_layers <num> --style_layers 0 1 2 3 4

Noise

We can initialize the canvas with noise and then update the values to look similar to the content image having style composed on it. Using below script we generate a noise canvas and set its requires_grad = True. This enables the grad function to update the values of the following canvas.

generated_image = torch.randn(content_image.size())
generated_image.to(device, torch.float)
generated_image.requires_grad = True

Lets start with some experiments... 🔬

Changing Content Layers

bash command e.g,

python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 1 --optimizer "Adam"

parameters we are using

optimizer: "Adam" 
init_image: "noise"
Content_Layer 0 1 2 3 4
Generated Canvas 0out(1)anim 0outanim 0outanim 0outanim 0outanim

on A4000 GPU it took 33s to run with current configuration for one canvas generation

Early layers have composed style over canvas relatively well than higher layers but lost the semantics of content in terminal layers. Mid level layers have preserved the content while focusing less on style composition.

Changing Optimizer

python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 0 --iterations 2000

parameters we using

optimizer: "LBFGS"
init_image: "noise"
Content_Layer 0 1 2 3 4
Generated Canvas 0outanim 0outanim 0outanim 0outanim 0outanim

pn A4000 GPU it took 120s to run with current configuration for one canvas generation

Again early layers composed style over canvas relatively well than higher layers but while moving towards higher layers canvas is losing content representation maybe due to over composition of style. Last layer has again lost semantics to quite some extent.

Content

We can initialize the canvas with content image itself and then update the values to look similar to the content image having style composed on it. Using below line of code we initiate canvas with content image.

generated_image = content_image.clone().requires_grad_(True)

lets' start with some experiments...:microscope:

Changing Optimizer

bash command e.g,

python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 1 --optimizer "Adam" --init_image "content"
Content_Layers 0 1 2 3 4
Adam 0outanim 0outanim 0outanim 0outanim 0outanim
LBFGS 0outanim 0outanim 0outanim 0outanim 0outanim
Adam 0outanim 0outanim 0outanim 0outanim 0outanim

In first two rows the only change is in use of optimizer, and clearly both the optimizer produces comparitively similar canvas except in the last layer. Also Adam needs more iterations to produce semantically similar canvas to that of what LBFGS produce but at the same time former is quite fast to compute since its first order method and doesn't compute curvature of parameter space like latter.

So we used Adam once again on different set of content and style image(last row) to generate canvas and found that last layer in all the cases loses some content information and style over composes on canvas. First two layers are giving comparitively better results all the time.

Style

We can initialize the canvas with style image itself and then update the values to look similar to the content image having style composed on it. Using below line of code we initiate canvas with content image.

generated_image = style_image.clone().requires_grad_(True)

lets' start with some experiments... 🔬

Changing Optimizer

python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 1 --optimizer "Adam" --init_image "style"
Content_Layers 0 1 2 3 4
Adam 0out0anim 0out1anim 0out2anim 0out3anim 0out4anim

Composing content representation over style canvas doesn't seem like a great idea. Last layers over composed the style with some noise, content_layer: 2 smoothen out the background highlighting content.

Further Studies

From above experiments we can infer that in content_layer: 4 canvas has lost semantics to some extent due to either over composing of style or under-representation of content representation. We can infer that out in Visualization by looking at what each layer is contributing in generating canvas. The same can be said for content_layer: 3 but with relatively less prominence than the former.

In content_layer: 0 we can see that style is well composed over canvas while also preserving the content representation, same can be said for content_layer: 1 but with less prominence. So for further experiment lets' use content_layer: 0 and Adam for fast computation. Currently we have seen all the canvases generated by conv layers, lets experiment with relu now.

Content_Layers 0 1 2 3 4
conv 0out0anim 0out1anim 0out2anim 0out3anim 0out4anim
relu 0out0anim 0out1anim 0out2anim 0out3anim 0out4anim

looking at all the canvases from conv and relu we can infer that both the layers don't output very different canvases, and its safe to use either of layers for reconstruction.

Visualization

Until now we have reconstructed canvases using all the style layers and any one content layer, but in this section we will visualize the individual and grouped contribution of style and content layers. We have 3 ways to do so, either only visualizing content layer(s), or visualizing style layers(s) or both layers.

shell command to visualize is

python3 main.py --visualize "content" --content_layers 1 2 --iterations 1500 --fps 30 --sav_freq 5

ContentV

when --visualize "content" then we can only visualize the content representation of any layer or by grouping some layers.

Content_Layers 0 1 2 3 4
Canvas 0out0anim 0out1anim 0out2anim 0out3anim 0out4anim

Latter layers are capturing textures of the content image while not giving much weightage to color and low level feature details. Although content_layer: 4 canvas seems to have under-representated the content representation maybe due to insufficient number of gradients flowing back to canvas for update

Earlier layers captured the shape and somewhat texture really well.

What if we arbitralily choose some content layers and find the output of their resultant on canvas, lets check

python3 main.py --visualize "content" --content_layers 1 3 4 ---iterations 700 --fps 2 --sav_freq 5
Content_Layers 1 3 4 0 2 4
Canvas 0out0anim 0out0anim

StyleV

when --visualize "style" then we can only visualize the style representation of any layer or by grouping some layers.

Style_Layers 0 1 2 3 4
Adam 0out0anim 0out1anim 0out2anim 0out3anim 0out4anim

Style layers when visualized individualy seems to have not been contributing any significant style to canvas, infact while moving towards higher layers we see patterns of noise.

What if we arbitralily choose some content layers and find the output of their resultant on canvas, lets check

python3 main.py --visualize "style" --content_layers 1 3 4 ---iterations 2000 --fps 25 --sav_freq 8 --optimizer "Adam"
Style_Layers 0 1 4 1 2 3 0 1
Adam 0out0anim 0out1anim 0out2anim
LBFGS 0out0anim 0out1anim 0out2anim
canvas output when all the style layers were used

When visualized grouped contribution of layers we can see some style over canvas very clearly. LBGFS shows style in every canvas even when Adam failed to in style_layers: 1 2 3. On further looking into the matter we found that Adam too atleast 4000 iterations to learn the representations and output visually appealing style in comparision to others. The reason behind it can be that higher layers don't focus more on colors but on texture and Adam find it hard to extract the color features information than LBFGS.

In the last we can visualize what all the style layers are contributing to the canvas, it looks more similar to style image itself.

Both

For fun we will use all the style and content layers to generate the canvas, although this configuration worked for the below image but no for many.

Original image of lion was grey.

You can play with with other hyperparameters to generate canvases and enhance your understanding of Neural Style Transfer

About

Original Implementation of Neural Style Transfer in pytorch. Inspection has been performed into the working of thereof by changing the hyperparameters.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages