Skip to content

Conditional Cycle-Consistent Generative Adversarial Networks (CCycleGAN)

License

Notifications You must be signed in to change notification settings

gtesei/ccyclegan

Repository files navigation

Conditional Cycle-Consistent Generative Adversarial Networks (CCycleGAN)

Generative adversarial networks has been widely explored for generating photorealistic images but their capabilities in multimodal image-to-image translations in a conditional generative model setting have been vaguely explored. Moreover, applying such capabilities of GANs in the context of facial expression generation conditioning on the emotion of facial expression and in absence of paired examples, to our knowledge, is almost a green field. Thus, the novelty of this study consists in experimenting the synthesis of conditional facial expressions and we present a novel approach (CCycleGANs) for learning to translate an image from a domain (e.g. the face images of a person) conditioned on a given emotion of facial expression (e.g. joy) to the same domain but conditioned on a different emotion of facial expression (e.g. surprise), in absence of paired examples. Our goal is to learn a mapping such that the distribution of generated images is indistinguishable from the distribution of real images using adversarial loss and cycle consistency loss. Qualitative results are presented, where paired training data does not exist, with a quantitative justification of optimal hyperparameters.

Paper

Note: this is an unpaired image-to-image translation problem.

Installation

$ git https://github.com/gtesei/ccyclegan.git
$ cd ccyclegan/
$ sudo pip3 install -r requirements.txt

Train

$ python ccyclegan_t26.py

# Defaults
$ python ccyclegan_t26.py \
    -d_gan_loss_w 1 \
    -d_cl_loss_w 1 \
    -g_gan_loss_w 2 \
    -g_cl_loss_w 2 \
    -rec_loss_w 1  \
    -adam_lr 0.0002 \
    -adam_beta_1 0.5 \
    -adam_beta_2 0.999 \
    -epochs EPOCHS  170   \
    -batch_size 64   \
    -sample_interval 200 \
    
# Usage
$ python ccyclegan_t26.py -h

Dataset

FER2013 consists of 28,709/7,178 train/test 48x48 pixel grayscale images of faces annotated with the emotion of facial expression as one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral). The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. Thanks to its resolution this is a good trade-off between accuracy and model complexity allowing to iterate quickly many times. You need to download the dataset from Kaggle and put fer2013.csv under the folder datasets.

Note: this is a dataset of unpaired images, i.e. for a given person/facial expression there are NOT other images of the same person with different facial expressions.

Exemplar Results

[T26] [Reconstruction Loss Weight = 1/2 Adversarial Loss Weight = 1/2 Facial Expression Classification Loss Weight]

[T25/26] [Reconstruction Loss Weight = Adversarial Loss Weight = Facial Expression Classification Loss Weight ; More Stable Training Procedure (G)]

[T24] [Reconstruction Loss Weight = Adversarial Loss Weight = Facial Expression Classification Loss Weight]

Experiment Log

Id Code Description Notes
T1 ccyclegan_t1.py Baseline - GAN loss (the negative log likelihood objective) is replaced by a least-squares loss [X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. In CVPR. IEEE, 2017]. Also, we adopt the technique of [Y. Taigman, A. Polyak, and L. Wolf. Unsupervised cross-domain image generation. In ICLR, 2017.] and regularize the generator to be near an identity mapping when real samples of the target domain are provided as the input to the generator. Weights are the same of the paper of CycleGAN, i.e. Identity loss weight = 0.1*Cycle-consistency loss weight , G-loss = 1. G-loss too high compared to D-loss. Let's try to increment the weight of G-loss.
T2 ccyclegan_t2.py Like T1 but the G-loss weight is is set to 20 No reconstruction in 200 epochs – discriminator has 100% accuracy
T3 ccyclegan_t3.py Like T1 but Identity loss weight is set to 0, G-loss is binary cross-entropy instead of least-squares loss (and weight is set to 7). G-loss too high vs. D-loss
T4 ccyclegan_t4.py Let's simplify the problem: only from domain “Neutral” to domain “Happy”, and from domain “Happy” to domain “Neutral”. No other transformations. Discriminator ~100% accuracy. This can be due to the fact that, reducing the problem in this way, also the training data is reduced and the generator does not benefit from this. This is an example of situation when Multi-task learning should be applied. Let's restore the problem to its original terms!
T5 ccyclegan_t5.py Like T1 but we concatenate the label encoded after the convolutions as shown in this paper: https://arxiv.org/ftp/arxiv/papers/1708/1708.09126.pdf. Identity loss is removed. G-loss too high vs. D-loss
C1 classifier.py D of ccyclegan_t4.py accuracy-train ~ 100%, accuracy-test ~70% which is compatible with the winner model of the Kaggle competition, i.e. 0.71161
T6 ccyclegan_t6.py Let's make G predict on train images to see if G is able to generate realistic images. Images not very realistic.
C2 classifier2.py ResNet50 pre-trained on ImageNet (RGG-images). Note: images from grayscale are transformed into RGB. After 15 epochs we have accuracy-train ~ 95% and accuracy-test ~ 80%, which better than the winner of the Kaggle competition (ResNet was released in 2015 vs. the competition was organized in 2013)
T7 ccyclegan_t7.py Let's use ResNet50 pre-trained on ImageNet as D. Generated images not very realistic and D-accuracy~100%
T8 ccyclegan_t8.py Change the training procedure removing shuffling and training D only on real given y_true samples, i.e. removing real given y_false samples. Discriminator has ~100% accuracy on training set, reconstruction loss is low and G-loss is high compared with D-loss.
T9 ccyclegan_t9.py Let's use ResNet50 pre-trained on ImageNet as D like T7 but let's freeze it. Generated images not very realistic and reconstruction loss is low.
T10 ccyclegan_t10.py x2 GAN loss weight (G) No significative changes from previous model
T11 ccyclegan_t11.py Class label ecoded after convolutional layers (G). Some enhancements. Face expressions are better. Needs to be more realistic.
T12 ccyclegan_t12.py Dropout as regularization technique. Not very helpful.
T13 ccyclegan_t13.py Like T11, just trained longer(300 epochs). Some enhancements. Face expressions are better. Needs to be more realistic.
T14 ccyclegan_t14.py Class label ecoded after convolutional layers (D). Not very helpful.
T15 ccyclegan_t15.py Like T14. Removed the batch of real images with wrong labels to the discriminator. Not very helpful.
T16 ccyclegan_t16.py Like T11. Removed the batch of real images with wrong labels to the discriminator. Not very helpful.
T17 ccyclegan_t17.py Generator divided into G_enc (responsible to encode image into latent space) and G_dec (responsible to decode latent vector into image) This change is conceptually correct and I have kept in following models, but results here are not very different
T18 ccyclegan_t18.py Like T17 but with same weights for loss functions Results are not very different
T19 ccyclegan_t19.py Like T17 but adversarial loss (G/D) is 1/7 of facial classification loss (G/D) Results are not very different
T20 ccyclegan_t20.py Like T17 but added a transformation layer (concatenation of class label to latent vector + dense block + LeakyReLU block + 1x1 convolution to have the correct number of channels) between G_enc and D_dec. This change is conceptually correct and I have kept in following models, but results here are not very different
T21 ccyclegan_t21.py Like T20 but sigmoid instead of softmax as last block for face classification (G/D) This change is conceptually correct and I have kept in following models, but results here are not very different
T22 ccyclegan_t22.py Like T21 but during the D training for generated images created a new 0-label for fake facial expressions. The outputs from sigmoids are forced to be zero in case of fake expressions in D instead of the desired class label. In turn, this force G to learn better as oterwise, it would be penalized twice (gan real/fake loss + gan facial expression loss) This change is conceptually correct and I have kept in following models, but results here are not very different
T23 ccyclegan_t24.py Like T22, just trained longer (400 epochs). Results are not very different
T24 ccyclegan_t22.py Like T21 but training procedure of D/G significatively changed. For each sample, all the other 7-1=6 possible face expressions are generated (G) and used to train D/G. Much better results
T25 ccyclegan_t25.py Just code refactoring and sample shuffling is added to G training. Still good results, perhaps even better. G loss looks better after 200 epochs (shuffling looks like to stabilize training procedure).
T26 ccyclegan_t26.py Code refactoring, experimented different combinations of Adversarial Loss Weight (G) vs. Facial Expression Classification Loss Weight (G): 1:1, 1:2, 2:1, 3:9, 10:100. Experimented lower learning rate (0.0001 vs. 0.0002). Confirmed best hyper-params.
T27 ccyclegan_t27__hyper_params.py Frechet Inception Distance (FID) to to find the best mix of hyperparameters. Confirmed best hyper-params.

Releases

No releases published

Packages

No packages published

Languages