Skip to content

This is a personal project implementing Convolutional Neural Networks (CNNs) and Variational Autoencoder (VAE) for sound generations

Notifications You must be signed in to change notification settings

dchen376/ML---Sound-Generation

Repository files navigation

FSDD

The sound dataset was gathered from this git repository -> Free Spoken Digit Dataset (FSDD): https://github.com/Jakobovski/free-spoken-digit-dataset

MNIST

The analysis.py is using MNIST (Modified National Institute of Standards and Tehchnology) as a dataset for pre-analysis purpose; dataset of handwritten digits.

Youtube reference

Youtube tutorials on music generation: https://youtube.com/playlist?list=PL-wATfeyAMNpEyENTc-tVH5tfLGKtSWPp&si=53DtJN6I_OKJFAr-

Steps to follow in this project

step 0 - Understand vanilla autoencoder which consists of both an encoder and a decoder.

  • build an encoder
  • build a decoder
  • combine and make the autoencoder
  • train the autoencoder
  • test the autoencoder with mnist dataset
  • plot the testing results

step 1 - Implement Variational Autoencoder (VAE)

  • modify encoder component (modify the bottleneck -> z = u + sum(epsilon))
  • modify loss function: RMSE + KL (Kullback-Leibler Divergence (closed form))
  • train vae

step 2 - Preprocessing Audio Datasets

  • use Free Spoken Digit Dataset (FSDD) (an audio preprocessing library)
  • implement Loader and Padder for file processing
  • implement LogSpectrogramExtractor to preprocess audio files as spectrograms
  • implement MinMaxNormaliser
  • implement the Preprocessing Pipeline
  • implement Saver

step 3 - Training a VAE with speech data in Keras

  • load Free Sound Digits Dataset (FSDD)
  • reshape the data
  • train the VAE

step 4 - Sound Generation with VAE

  • build a SoundGenerator class
  • Implement a generate.py script
  • generate Sound from Spectrograms

Run this Sound Generative Model

  • parameters and weights of the trained model in step 3 is saved in the model folder
  • use the FSDD as the data for sound generation
  • download and use step 4 folder as this is the final step and contains the final version of the files for running the model

About

This is a personal project implementing Convolutional Neural Networks (CNNs) and Variational Autoencoder (VAE) for sound generations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages