Moving average and moving variance in Batchnorm aren't updated #11965

idofr · 2017-08-02T11:22:20Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installed from (source or binary): pip
TensorFlow version (use command below): 1.2.1
Python version: 3.5.3
Bazel version (if compiling from source): None
CUDA/cuDNN version: 8/5.1
GPU model and memory: GeForce 1080
Exact command to reproduce:

Describe the problem

I'm using the slim wrapper, which in turn returns an instance of BatchNormalization from layers/normalisation.py. All paramers are set to default, except for scale which is set to True (i.e. adding the gamma scaler). After training, when looking the at the learned parameters, I notice that all the moving means in the network are still 0 while all the moving variances are 1, i.e. they weren't updated.

Both variables don't show up in tf.trainable_variables() which might explain the lack of updates. However, since these are not actually learned but rather calculated, I'm not sure whether they would be updated by the optimiser.

idofr · 2017-08-02T15:45:56Z

I can't edit my orginal message, so I'll just add a comment.
I tried running the test function with is_training=False (but with the exact same checkpoint as before). The accuracy went from ~98% to roughly 12%.

My theory here is that the batchnorm layer is keeping the mean and variance variables in a different place than it's telling the collection

ppwwyyxx · 2017-08-03T03:09:26Z

You probably forgot this which is written in the document of batch_norm:

  Note: when training, the moving_mean and moving_variance need to be updated.
  By default the update ops are placed in `tf.GraphKeys.UPDATE_OPS`, so they
  need to be added as a dependency to the `train_op`. For example:

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
      train_op = optimizer.minimize(loss)

  One can set updates_collections=None to force the updates in place, but that
  can have a speed penalty, especially in distributed settings.

idofr · 2017-08-03T07:23:03Z

Ok, this was indeed the problem. Many thanks.

Do I need to concatenate this collection with any other one for normal training? Why does it make sense to have it like this?

Are the statistics (mean and var.) also updated without the optimiser settings?

ppwwyyxx · 2017-08-03T07:59:55Z

Other layers don't have similar caveats AFAIK.

It makes sense because the moving averages are not updated by gradient descent, therefore there must be another way to update it.

idofr · 2017-08-07T09:24:58Z

Many thanks for the help and the info.
Yet, I fear I'll have to ask for a re-open as the problem only seems to be half solved.
The mean and variance are properly saved and loaded now (why does it saved the variance btw, when the std is required?). However, when evaluating the model with is_training=False the accuracies are still around 35% while the same script with is_training=True has around 97% accuracy.

I checked if all the weights and parameters are properly loaded and everything seems to be in place

idofr · 2017-08-08T15:28:27Z

Same as
#1122

https://stackoverflow.com/questions/42770757/tensorflow-batch-norm-does-not-work-properly-when-testing-is-training-false

https://stackoverflow.com/questions/39353503/tensorflow-tf-slim-model-with-is-training-true-and-false?rq=1

https://stackoverflow.com/questions/44211371/tensorflow-batch-norm-breaks-network-when-is-training-false?rq=1

I'm currently training again with a lower decay to confirm #1122 and will update tomorrow.

Update - the lower decay rate (0.9) and updates_collections=None seemed to do the work.

keven425 · 2017-08-28T19:39:16Z

I am experiencing the same issue. My validation accuracy on CIFAR 10 is lower with batchnorm than without. I have added tf.GraphKeys.UPDATE_OPS to the optimizer, and set is_training=False during validation. I'm on tensorflow 1.3.

Why is a decay rate required for batch_norm to work? Is there a bug with the batch_norm implementation?

shahar-scopio · 2018-01-10T10:09:01Z

Make sure that you collect with tf.GraphKeys.UPDATE_OPS with the right name scope:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS,name_scope)

huosan0123 · 2018-05-09T04:20:49Z

change decay=0.999 to 0.9 works fine for me.

dishank-b · 2018-06-05T19:03:09Z

@ppwwyyxx . I read the https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/layers/normalization.py code, to see the implementation of tf.layers.batch_normalization().
But in this code I could not find any control dependency adding for moving mean and variance. There is no line of code which puts the moving average or variance in tf.GraphKeys.UPDATE_OPS collection.

ppwwyyxx · 2018-06-05T19:13:08Z

tensorflow/tensorflow/python/layers/normalization.py

Lines 416 to 417 in 23c2187

    
           self.add_update(mean_update, inputs=inputs) 
        
           self.add_update(variance_update, inputs=inputs)

dishank-b · 2018-06-05T19:21:30Z

Can you please refer to function add_update?

facaiy · 2018-06-06T03:45:47Z

I think its add_update method is inherited from base.Layer:

tensorflow/tensorflow/python/layers/base.py

Line 237 in 23c2187

def add_update(self, updates, inputs=None):

poxvoculi assigned mrry and unassigned mrry Aug 2, 2017

poxvoculi added type:support Support issues stat:community support Status - Community Support labels Aug 3, 2017

poxvoculi closed this as completed Aug 3, 2017

NPetsky mentioned this issue Jul 26, 2018

Mobilenet v1 with cifar10 unexpected behavior #21058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving average and moving variance in Batchnorm aren't updated #11965

Moving average and moving variance in Batchnorm aren't updated #11965

idofr commented Aug 2, 2017

idofr commented Aug 2, 2017

ppwwyyxx commented Aug 3, 2017 •

edited

idofr commented Aug 3, 2017 •

edited

ppwwyyxx commented Aug 3, 2017

idofr commented Aug 7, 2017 •

edited

idofr commented Aug 8, 2017 •

edited

keven425 commented Aug 28, 2017 •

edited

shahar-scopio commented Jan 10, 2018

huosan0123 commented May 9, 2018

dishank-b commented Jun 5, 2018

ppwwyyxx commented Jun 5, 2018

dishank-b commented Jun 5, 2018 •

edited

facaiy commented Jun 6, 2018

Moving average and moving variance in Batchnorm aren't updated #11965

Moving average and moving variance in Batchnorm aren't updated #11965

Comments

idofr commented Aug 2, 2017

System information

Describe the problem

idofr commented Aug 2, 2017

ppwwyyxx commented Aug 3, 2017 • edited

idofr commented Aug 3, 2017 • edited

ppwwyyxx commented Aug 3, 2017

idofr commented Aug 7, 2017 • edited

idofr commented Aug 8, 2017 • edited

keven425 commented Aug 28, 2017 • edited

shahar-scopio commented Jan 10, 2018

huosan0123 commented May 9, 2018

dishank-b commented Jun 5, 2018

ppwwyyxx commented Jun 5, 2018

dishank-b commented Jun 5, 2018 • edited

facaiy commented Jun 6, 2018

ppwwyyxx commented Aug 3, 2017 •

edited

idofr commented Aug 3, 2017 •

edited

idofr commented Aug 7, 2017 •

edited

idofr commented Aug 8, 2017 •

edited

keven425 commented Aug 28, 2017 •

edited

dishank-b commented Jun 5, 2018 •

edited