This repository contains code for the paper "Generative Modelling for Controllable Audio Synthesis of Piano Performance" by Hao Hao Tan, Yin-Jyun Luo and Dorien Herremans.
We utilize Gaussian Mixture VAEs in neural audio synthesis models to allow temporal conditioning of two essential style features for piano performances: articulation and dynamics.
- Download the MAESTRO v2.0.0 dataset.
- Modify the training configurations in
nms_latent_config.json
. - Run
python trainer_nms_latent_dynamic.py
. - The trained model weights and logs can be found in
params/
andlogs/
folder respectively.
After training completes, follow visualize.ipynb
to observe the controllable generation of spectrograms under different degrees of articulation / dynamics.
For details on training WaveGlow, kindly refer to: https://github.com/yjlolo/constant-memory-waveglow
This research work is published at the ICML ML4MD Workshop, 2020.
@inproceedings{tan20generative,
author = {Tan, Hao Hao and Luo, Yin-Jyun and Herremans, Dorien},
booktitle = {ICML Workshop on Machine Learning for Music Discovery Workshop (ML4MD), Extended Abstract},
title = {Generative Modelling for Controllable Audio Synthesis of Piano Performance},
year = {2020}
}