Skip to content

sungsujaing/letter_digit_generator_VAE

Repository files navigation

letter_digit_generator_VAE

This project aims to build a conditional variational autoencoder (CVAE) to generate arbitrary handwritten letters/digits based on the keyboard input. Based on the EMNIST dataset, the CVAE model is trained to encode the handwritten letters/digits into a latent vector space. With a random sampling or interpolation technique, imaginary letters and digits are generated.

EMNIST data examples

LDG Version 3

  • Loss: binary crossentropy
  • Optimizer: Adam
  • Latent dimension: 6
  • Image normalization: [0, 1]
  • Last activation function of the decoder: sigmoid
  • Convolutional CVAE layers: [784,62]-[784]-[(28,28,1)]-[(14,14,16)]-[(7,7,32)]-[1568]-[64]-[6] // [6,62]-[64]-[1568]-[(7,7,32)]-[(14,14,32)]-[(28,28,16)]-[(28,28,1)]-[784]
  • Multi-layer CVAE layers: [784,62]-[256]-[128]-[6] // [6,62]-[128]-[256]-[784]

A command-line letters/digits generator based on the ldg_v3 Conv-CVAE model (details below). It simply loads the Conv-CVAE model and the corresponding best weights to produce results.

  • label inputs to both encoder and decoder

Training

Dataset reconstruction

Generating new letters/digits (with/without arbitrary binary threshold filter)

LDG Version 2

  • Loss: MSE
  • Optimizer: Adam
  • Latent dimension: 10
  • Image normalization: [-1, 1]
  • Last activation function of the decoder: tanh
  • Convolutional CVAE layers: [784,62]-[784]-[(28,28,1)]-[(28,28,16)]-[(28,28,32)]-[(28,28,64)]-[12544]-[128]-[10] // [10,62]-[128]-[12544]-[(14,14,64)]-[(28,28,32)]-[(28,28,16)]-[(28,28,1)]-[784]
  • Multi-layer CVAE layers: [784,62]-[512]-[256]-[10] // [10,62]-[256]-[512]-[784]

A command-line letters/digits generator based on ldg_v2 Conv-CVAE model (details below). It simply loads the Conv-CVAE model and the corresponding best weights to produce results.

  • label inputs to both encoder and decoder

Training (direct comparison is difficult due to the difference in epochs)

Dataset reconstruction

Generating new letters/digits (with/without arbitrary binary threshold filter)

LDG Version 1

Initial convolutional conditional variational autoencoder model.

  • label inputs only to decoder
  • training/test data reconstructions were satisfactory, but generation of specific string input was somewhat difficult.

VAE interpolation from image 1 to image 2

While the model architecture seems to be okay, the standford dogs datasets may not be suitable to train VAE.

Releases

No releases published

Packages

No packages published