Skip to content

zhongyuk/SMiLEConverter

Repository files navigation

SMiLEConverter

Introduction

This repo implements an Auxiliary Conditional GAN (ACGAN) model, a Wasserstain ACGAN model, and an image encoder to build and train a system for converting human smiling expressions to non-smiling expressions and the other way around.

Dataset

The system is trained using the Large-scale CelebFaces Attributes (CelebA) Dataset. A quick image sample and attributes exploration is shown in notebook/ExploreCelebA

Programming language

Python 2.7

Library Dependencies

  • TensorFlow 1.0
  • Numpy 1.12.0
  • Scipy 0.18.1
  • Pandas 0.19.2
  • Matplotlib 2.0.0
  • tqdm 4.11.2

Credits

The initial vanilla GAN and inital Wasserstain GAN implementations are adapted from and credited to Sarath Shekkizhar. The ACGAN and WACGAN model frameworks are built on top of the inital vanilla GAN.

Prerequisites & Expeirments Explained

  • Model was trained in Paperspace Linux system with Nvidia GPU
  • CelebA dataset should be downloaded and unzipped manually
  • Large portion of the implementations, settings, and trainings follows the improved GAN training techniques and the "How to train a GAN" talk at NIPS2016
  • Various experiments with different hyperparameter settings were explored and published in the bash_file/ directory:
    • ACGAN with feature matching - L1 distance (manually replace the tf.nn.l2_loss with tf.abs at line 548 in the models/GAN_model.py file)
    • ACGAN with feature matching - L2 distance
    • WACGAN without feature matching
    • WACGAN with feature matching - L1 distance (manually replace the tf.nn.l2_loss with tf.abs at line 698 in the models/GAN_model.py file)
    • WACGAN with feature matching - L2 distance
    • Train an image encoder with a frozen generator using L1 distance as loss metric
    • Train an image encoder with a frozen generator using L2 distance as loss metric (manually replace the tf.abs with tf.nn.l2_loss at line 183 in the models/Encoder.py file)

Observations

  • ACGAN with feature matching (both L1 and L2 distance) is able to achieve generating decent qualify of human face images with relatively pure class based generation
  • WACGAN takes longer to train, its generated image quality is not as good as ACGAN, however it produces purer class based generation
  • Neither L1 distance nor L2 distance turns out to be good metrics to train an image encoder for performing the perserving identity task
  • Without regulating the distribution of the image encoder's output, mode collapose could happen at the second stage even though it didn't happen during the first stage of training the ACGAN and the generator

Results

A more detailed information regarding the architectures of the models and the results of the trained models are aggregated into this demo presentation

Future Work & Improvement

  • Modify the image encoder to using precepture loss/image embeding space distance
  • Regulate the distribution of the latent Z vectors produced by the image encoder to match with the Generator's input noise vector distribution

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published