New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving average and moving variance in Batchnorm aren't updated #11965
Comments
I can't edit my orginal message, so I'll just add a comment. My theory here is that the batchnorm layer is keeping the mean and variance variables in a different place than it's telling the collection |
You probably forgot this which is written in the document of batch_norm:
|
Ok, this was indeed the problem. Many thanks. Do I need to concatenate this collection with any other one for normal training? Why does it make sense to have it like this? Are the statistics (mean and var.) also updated without the optimiser settings? |
Other layers don't have similar caveats AFAIK. It makes sense because the moving averages are not updated by gradient descent, therefore there must be another way to update it. |
Many thanks for the help and the info. I checked if all the weights and parameters are properly loaded and everything seems to be in place |
I am experiencing the same issue. My validation accuracy on CIFAR 10 is lower with batchnorm than without. I have added Why is a decay rate required for batch_norm to work? Is there a bug with the batch_norm implementation? |
Make sure that you collect with tf.GraphKeys.UPDATE_OPS with the right name scope: |
change decay=0.999 to 0.9 works fine for me. |
@ppwwyyxx . I read the https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/layers/normalization.py code, to see the implementation of tf.layers.batch_normalization(). |
tensorflow/tensorflow/python/layers/normalization.py Lines 416 to 417 in 23c2187
|
Can you please refer to function add_update? |
I think its tensorflow/tensorflow/python/layers/base.py Line 237 in 23c2187
|
System information
Describe the problem
I'm using the slim wrapper, which in turn returns an instance of BatchNormalization from layers/normalisation.py. All paramers are set to default, except for scale which is set to True (i.e. adding the gamma scaler). After training, when looking the at the learned parameters, I notice that all the moving means in the network are still 0 while all the moving variances are 1, i.e. they weren't updated.
Both variables don't show up in tf.trainable_variables() which might explain the lack of updates. However, since these are not actually learned but rather calculated, I'm not sure whether they would be updated by the optimiser.
The text was updated successfully, but these errors were encountered: