Slightly changed codes from this repository.
Also referenced this repository.
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
"This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation."
Basically, you add some latent codes, a set of random int(categorical) or random float(continuous), to the noise vector and the network tries to maximize the mutual information between each code and the generated image. By doing so, each of the latent variables will be mapped to some salient attributes of the input data, such as tilting or thickness of characters(MNIST). As a result, you can control some features of the generated image by manipulating the codes.
Loss function: min(G,Q) max(D) VInfoGAN(D, G, Q) = V (D, G) − λI(c; G(z, c))
Mutual information: I(c; G(z, c)) = H(c) − H(c|G(z, c)) = ... ≥ Ec∼P (c),x∼G(z,c)[log Q(c|x)] + H(c)
Maximize mutual information
<=> Leave out all the constant factors & Maximize Q(c|x): estimate the likelihood of seeing that code for the given generated input x (Maximum Likelihood Estimation)
<=> Minimize NLL(negative log likelihood) of Q(c|x)
G(z,c) -> D -> Qcat -> score for each category
G(z,c) -> D -> Qcont -> statistics of the estimated distribution: mean and variance
Q loss = cross entropy(categorical code, Qcat(G(z, c))) + 0.1(lamda) * gaussian NLL(continuous code, Qcont(G(z, c)))
Discriminator loss = original D loss + Q loss
Generator loss = original G loss + Q loss
Thickness and tilt are changing.
I added another continuous latent code, but seems like this one doesn't have any special attribute mapped.