Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert gan_mnist example from tensorflow to kur? #72

Open
EmbraceLife opened this issue May 6, 2017 · 1 comment
Open

How to convert gan_mnist example from tensorflow to kur? #72

EmbraceLife opened this issue May 6, 2017 · 1 comment

Comments

@EmbraceLife
Copy link
Contributor

EmbraceLife commented May 6, 2017

I want to convert a gan_mnist in tensorflow to gan_mnist in kur.

At this moment, all I know about model in kur is the following:
model's life cycle:

  1. model as dict in kurfile -->
  2. model as list of kur containers or layers -->
  3. model as layers of keras -->
  4. with a Func defined in keras (which I cannot manage to see inside from kur, but I would love to be able to see) to train -->
  5. the process of training (forward pass to get loss, and backward pass to update weights) is sealed inside keras like a blackbox (because I can't see them in kur source code), therefore, I can only access loss and weights at the end of each batch training but nothing from individual layer.

It seems to me that I can't access each layer's output directly, such as logits of each model below. I hope I am wrong. Is there a way to access each layer's output directly with kur? or can I write some additional functions in kur to access outputs of each layer of the models?

Another difficulty I have is to write the models in kurfile. Is the kurfile below make sense or valid in logic and style? I prefer kurfile over using kur api directly, but I don't know what to put in kurfile a lot of times. At the moment, I am confused about when to use -, and when not to use -, I have marked the place where I am particularly confused with ????.

There are two sections below: 1. parts of kurfile; 2. corresponding parts in tensorflow

Section1: some key sections of gan_mnist pseudo-kurfile

How would you write this gan-kurfile? I would like to see what this gan kurfile would look like (it needs not to be working code, I just want to see the proper pseudo kurfile you may write)

model:  # see model code in tensorflow below
  generator:
    - input: input_z # shape (?, 100)
    - dense: 128 # g_hidden_size = 128
    - activation:
        name: leakyrelu
        alpha: 0.01
    - dense: 784 # out_dim (generator) = input_size (real image) = 784
    - activation:
        name: tanh
    - output: # output of the latest layer
        name: g_out # shape (?, 784)

  discriminator_real:
    - input: input_real # or images # shape (?, 784)
    - dense: 128 # d_hidden_size
    - activation:
        name: leakyrelu
        alpha: 0.01
    - dense: 1 # shrink nodes from 128 to 1, for 2_labels classification with sigmoid (non softmax)
        logits: d_logits_real # can I output logits here
# do I need to output logits
    - activation:
        name: sigmoid
    - output: # output of the latest layer
# can logits in the layer before the latest layer be accessed from here?
        name: d_out_real # not used at all ?

  discriminator_fake:
    - input: g_out # shape (?, 784)
  - dense: 128 # d_hidden_size
  - activation:
      name: leakyrelu
      alpha: 0.01
  - dense: 1 # shrink nodes from 128 to 1, for 2_labels classification with sigmoid (non softmax)
      logits: d_logits_fake # can I output logits here
# do I need to output logits
  - activation:
      name: sigmoid
  - output: # output of the latest layer
# can logits in the layer before the latest layer be accessed from here?
      name: d_out_fake # not used at all?

# https://kur.deepgram.com/specification.html?highlight=loss#loss
loss:  # see loss code in tensorflow below
  generator:
    - target: labels_g   # labels=tf.ones_like(d_logits_fake), it can be defined as one input data 
    - logits: d_logits_fake # when to use `-`, when not????
      name: categorical_crossentropy
      g_loss: g_loss
  discriminator_real:
    - target: labels_d_real # labels=tf.ones_like(d_logits_real) * (1 - smooth)
    - logits: d_logits_real
      name: categorical_crossentropy
      d_loss_real: d_loss_real
  discriminator_fake:
    - target: labels_d_fake # labels=tf.zeros_like(d_logits_fake)
    - logits: d_logits_fake
      name: categorical_crossentropy
      d_loss_fake: d_loss_fake

train:
  optimizer: # see the optimizers tensorflow code below
    - opt_discriminator:
        name: adam
        learning_rate: 0.001
        d_loss: d_loss #  d_loss = d_loss_real + d_loss_fake
        d_trainable: d_vars
    - opt_generator:
        name: adam
        learning_rate: 0.001
        g_loss: g_loss
        g_trainable: g_vars

Section2 is the key parts (d_model, g_model, losses, optimizers ... ) in tensorflow below

Inputs for generator and discriminator

def model_inputs(real_dim, z_dim):
	# real_dim is 784 for sure
    inputs_real = tf.placeholder(tf.float32, (None, real_dim), name='input_real')

	# z_dim is set 100, but can be almost any number
    inputs_z = tf.placeholder(tf.float32, (None, z_dim), name='input_z')

    return inputs_real, inputs_z

Generator model

def generator(z, out_dim, n_units=128, reuse=False, alpha=0.01):
    with tf.variable_scope('generator', reuse=reuse):
        # Hidden layer
        h1 = tf.layers.dense(z, n_units, activation=None)
        # Leaky ReLU
        h1 = tf.maximum(alpha * h1, h1)

        # Logits and tanh output
        logits = tf.layers.dense(h1, out_dim, activation=None)
        out = tf.tanh(logits)

        return out

Discriminator model

def discriminator(x, n_units=128, reuse=False, alpha=0.01):
    with tf.variable_scope('discriminator', reuse=reuse):
        # Hidden layer
        h1 = tf.layers.dense(x, n_units, activation=None)
        # Leaky ReLU
        h1 = tf.maximum(alpha * h1, h1)

        logits = tf.layers.dense(h1, 1, activation=None)
        out = tf.sigmoid(logits)

        return out, logits

Hyperparameters

# Size of input image to discriminator
input_size = 784
# Size of latent vector to generator
# The latent sample is a random vector the generator uses to construct it's fake images. As the generator learns through training, it figures out how to map these random vectors to recognizable images that can fool the discriminator
z_size = 100 # not 784! so it can be any number?
# Sizes of hidden layers in generator and discriminator
g_hidden_size = 128
d_hidden_size = 128
# Leak factor for leaky ReLU
alpha = 0.01
# Smoothing
smooth = 0.1

Build network

tf.reset_default_graph()

# Create our input placeholders
input_real, input_z = model_inputs(input_size, z_size)

# Build the model
g_out = generator(input_z, input_size)
# g_out is the generator output, not model object

# discriminate on real images, get output and logits
d_out_real, d_logits_real = discriminator(input_real)
# discriminate on generated images, get output and logits
d_out_fake, d_logits_fake = discriminator(g_out, reuse=True)

Calculate losses

# get loss on how good discriminator work on real images
d_loss_real = tf.reduce_mean(
                  tf.nn.sigmoid_cross_entropy_with_logits(
				  			logits=d_logits_real,
							# labels are all true as 1s
							# label smoothing *(1-smooth)
                            labels=tf.ones_like(d_logits_real) * (1 - smooth)))

# get loss on how good discriminator work on generated images 							
d_loss_fake = tf.reduce_mean(# get the mean for all the images in the batch
                  tf.nn.sigmoid_cross_entropy_with_logits(
				  			logits=d_logits_fake,
							# labels are all false, as 0s
                            labels=tf.zeros_like(d_logits_real)))

# get total loss by adding up 							
d_loss = d_loss_real + d_loss_fake

# get loss on how well generator work for generating images as real as possible
g_loss = tf.reduce_mean(
             tf.nn.sigmoid_cross_entropy_with_logits(
			 			logits=d_logits_fake,
						# generator wants images all be real as possible, so set True, 1s
                        labels=tf.ones_like(d_logits_fake)))

Optimizers

# Optimizers
learning_rate = 0.002

# Get the trainable_variables, split into G and D parts
t_vars = tf.trainable_variables()
g_vars = [var for var in t_vars if var.name.startswith('generator')]
d_vars = [var for var in t_vars if var.name.startswith('discriminator')]

# update the selected weights, or the discriminator weights
d_train_opt = tf.train.AdamOptimizer(learning_rate).minimize(d_loss, var_list=d_vars)

# update the selected weights, or the generator weights
g_train_opt = tf.train.AdamOptimizer(learning_rate).minimize(g_loss, var_list=g_vars)
@ajsyp
Copy link
Collaborator

ajsyp commented May 10, 2017

I'm not certain what you mean by, "It seems to me that I can't access each layer's output directly, such as logits of each model below.... Is there a way to access each layer's output directly with kur?" What does "directly" mean? You can reference other layers by name, and you can cause any layer to be outputted as part of the model output.

I think the bigger question is multi-modal architectures, like GANs. These are not currently supported in Kur, but is something on the horizon that I've been thinking about adding. Your Kurfile is logically consistent, I think, and stylistically good, but it isn't a valid Kurfile because, well, Kur doesn't support multiple models.

P.S. When to use "-" or not is a YAML thing. If you indent and use "-", you are starting a list:

grocery_list:
  - apples
  - oranges

If you indent without using "-", you are starting a map/dictionary/key-value pairs:

movie_ratings:
  harry_potter: good
  twilight: bad

You can nest these things: you can have list items which are dictionaries, you have dictionaries whose values are lists, you can have dictionaries whose values are dictionaries, you can have list items which are themselves lists, etc. If you are ever in doubt, look at the YAML spec or, if you are more comfortable in JSON, just use JSON Kurfiles or a YAML/JSON converter to see what is going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants