Style Transfer using Convolutional Neural Network

Author: Ryan Chan (ryanchankh@berkeley.edu), Last Updated: 30 January 2019

Motivation

Layers in neural network contain useful information. For example, one can use the convolutional operation to reduce the dimension of the data, while embedding common information between each layer. Known as actviation maps, they contain useful presentations that can be processed for further purpose. Artistic Style Transfer is one of many examples that utilizes actvations in convolutional neural networks (VGG19) (Simonyan, K., & Zisserman, A. 2014) to produce useful results. This project sets to explore activation maps further.

Instruction for Testing and Producing Results

VGG weights

First download vgg weights from here. Put this in /style_transfer/vgg/. No change of file name needed.

Model Options

All options for training are located in main.py. The options you can fine tune are:

Dimension of the image
Layers for the style and content image activation maps
Weights for each layer
Trade-off between style and content (alpha for content and beta for style)
File path for content and style image
Initial image (content image, style image, white image, or random image)
Number of steps between each image save (save_per_step = -1 if no saving wanted)

To run the model, run in command line

python3 main.py

Model Structure and the Flow of Information

Preprocess

style image is rescaled to be the same size as content image.
When images are loaded and turned into (height, width, channel) array, mean pixel values are subtracted from them such that their pixel values are centered at 0. This is due to the properties of the weights in our VGG Network, and computing the gram matrix requires values to be centered at 0.
Both image are passed into the VGG network, and activation maps from specific layers are extracted.
For activation maps from style image, we pre-compute each layer's gram matrix.
A random image is generated, ready to be updated at each iteration. This is our only variable that is being udpated.

Generating result

Each iteration, we pass in the random image to obtain the same layers of activation maps we chose for content and style.
We then compute the content loss, which is the mean squared error between the activation maps of the content image and that of the synthesized image.
Similarily, the style loss is the mean squared error between the gram matrix of the activation maps of the content image and that of the synthesized image. The Gram matrix can be interpreted as computing the covariance between each pixel. Each layer's style loss is multipled by a style loss weight such that style loss from each layer is averaged out.
The content loss and style loss are multipled by their respective tradeoffs, is then added up together, becoming the total loss.
At each iteration, the random image is updated such that it converges to a synthesized image. Our model uses L-BFGS algorithm to mimize the loss.

Replication of Figures in Paper

Figure 1 - Image Representations in a Convolutional Neural Network

Content Reconstruction. The following figures are created with alpha = 1, beta = 0.


`relu1_1`	`relu2_1`	`relu3_1`	`relu4_1`	`relu5_1`

Style Reconstruction. The following figures are created with alpha = 0, beta = 1.


`relu1_1`	`relu1_1` `relu2_1`	`relu1_1` `relu2_1` `relu3_1`	`relu1_1` `relu2_1` `relu3_1` `relu4_1`	`relu1_1` `relu2_1` `relu3_1` `relu4_1` `relu5_1`

Figure 3 - Well-known Artwork examples

The following figures are created with:
Loss Weights: alpha = 1e-6, beta = 1
Style Weight: relu1_1 = 0.2 , relu2_1 = 0.2, relu3_1 = 0.2, relu4_1 = 0.2, relu5_1 = 0.2
Style Layers: relu1_1, relu2_1, relu3_1, relu4_1, relu5_1
Content Layers: relu4_2 = 1

Difference from original paper

A subtle difference between Leon's original implementation and this version is that the trade-off used to create the results are different. In the original paper, alpha / beta = 1e-4. Yet, I was unable to create the results with that loss trade-off. Hence, the figures about uses a alpha / beta = 1e-6 trade-off. I was unable to find where the difference in implementations of the models is.

Future Work

Definition of Representation. One advantanges of using neural networks on images is that there already exist perhaps the most useful and direct way to represent an image using numbers - pixel values. But this representation is not necessarily the only way to represent visual content. If there exist a different kind of "embedding" that encodes objects or relationship between pixels in a different way, content and style representation might change the way style transfer model defines the relationship between objects, or even color.

CNNs to Other Types of Neural Nets. One inspiration of Convolutional Neural Networks is the hierachical structure of the human visual cortex. Layer by layer, using convolution operation, an artifical neuron serves as a computing unit that summarizes information from previous layers and compresses into a smaller space, which is then passsed onto the later layers. This type of model is one of many ways of compressing into a more meaningful and less redundant representation. Other models for compression include autoencoders, which requires information to be passed down a smaller dimension and projected into a larger dimension again. Compression problems might shed insights on how information is embedded efficiently.

Losses and differences. The current style transfer model utilizes mean square error, which computes the difference between pixel values from the content or style image and the synthsized image. From a mathematical point of view, this seems logical and reasonable. But, a difference in pixel value may not necessarily imply a difference in content or style. For instance, if we were to create a synthsized image that is more invariant to the position of objects in our synthesized image, calculate the exact difference in pixel at each coordinate would not be sensible. In other words, the definition of loss when considering objects may require a much more extensive function than computing losses.

Other Github references

Throughout this project, I visited a few other implementations that provided me great insight to how to implement the style transfer model in a more efficient and neat way. The following is a list that I referenced.

As mentioned earlier, there is a slight difference in my implementation compared to the original implementation. I was trying to find one that exactly follows the original implementation, but most of them either also changes some settings on their own or implementations concurrently with other versions of style transfer.

Acknowledgement

I would like to devote my sincere gratitude to my mentor Dylan Paiton at UC Berkeley for the support he has given. Much of this would not be possible without he continually mental and technical support. I have learned a great deal about neural networks and neuroscience through discussions and weekly meetings, and I look forward to the more research in the future.

Paper References

Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2414-2423).

Personal Note

The artistic and imaginative side of human is known to be one of the most challenging perspective of life to model. Due to its free form and huamnly-cultivated experience, art is often appreciated not only because of its visual apperance, but also the history and motivations of the artist. In this project, I attempt to answer this question: "If we were to create a model that creates art, how would it do it, and what separates that from human life?"

This is my first project look in-depth into an academic paper and attempt to implement the model from scratch. Because it was widely used to illustrate what neural networks can do, artistic style transfer remains as one of the most interesting beginner projects. I am doing this to cultivate my extensive and critical thinking sills, and also understand the model thoroughly, to the extent where I have no doubt if asked to explain how it works from zero to a hundred.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
images		images
vgg		vgg
writeup		writeup
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
style_transfer.py		style_transfer.py
utils.py		utils.py

ryanchankh/style_transfer

Folders and files

Latest commit

History

Repository files navigation

Style Transfer using Convolutional Neural Network

Author: Ryan Chan (ryanchankh@berkeley.edu), Last Updated: 30 January 2019

Motivation

Instruction for Testing and Producing Results

VGG weights

Model Options

Model Structure and the Flow of Information

Preprocess

Generating result

Replication of Figures in Paper

Figure 1 - Image Representations in a Convolutional Neural Network

Figure 3 - Well-known Artwork examples

Difference from original paper

Future Work

Further Readings

Other Github references

Acknowledgement

Paper References

Personal Note

About

Topics

Resources

Stars

Watchers

Forks

Languages