Skip to content

Latest commit

 

History

History
47 lines (25 loc) · 2.64 KB

readme.md

File metadata and controls

47 lines (25 loc) · 2.64 KB

Audio Style Transfer

Introduction

Style transfer is a concept which is successfully applied to image domain with the example of creating a Van Gogh painting from any given input image. [1] Aim of this project is to adapt the "style transfer" concept to audio domain. Specifically, we aim to transfer the style of an audio (preferably a song) which is labeled as the "style", to another audio which is labeled as the "content", and synthesize a new audio with the general characteristics of the "style" by also remaining loyal to the "content". Through this goal, we can take a step forward for understanding the features of raw music audio signals such as the style, melody, rhythm, and tempo.

Some of the proposed solutions to this problem in the literature include using multiple time-frequency representations [2], short time Fourier transform and Griffin-Lim algorithm [3], and shallow convolutional networks [4]. We aim to implement some of these methods, use the results we will obtain as baselines and try to improve the baseline results by using different features, methods, and models. We want to contribute to this relatively new field of research and come up with interesting results which may bring more attention to the subject.

Progress

We implement and try two baseline implementations, one from the paper of Mital and the other from the blog post of Ulyanov.

Papers

Neural Style Transfer for Audio Spectograms

  • NIPS 2017 Workshop paper

Audio style transfer

  • IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2018

Time Domain Neural Audio Style Transfer (Baseline implementation: Mital)

Blogs

Audio texture synthesis and style transfer (Baseline implementation: Ulyanov)

Neural Style Transfer on Audio Signals

References

[1] A Neural Algorithm of Artistic Style, https://arxiv.org/abs/1508.06576

[2] “Style” Transfer for Musical Audio Using Multiple Time-Frequency Representations, https://openreview.net/forum?id=BybQ7zWCb

[3] Audio texture synthesis and style transfer, https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/

[4] Time Domain Neural Audio Style Transfer, https://arxiv.org/abs/1711.11160