New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy to use batch norm layer. #1122
Comments
I'm working on some parts of that. |
There is now a
|
I think some thing wrong with this layer. in training every thing is OK and loss decrease very good. but in testing I get zero accuracy. |
Same here, I have experienced some unexpected behavior with is_training=False. What is the correct way to change this flag? I am currently using a |
@pawni You have to use a Python boolean for |
@ppwwyyxx well I am doing |
Oh I thought you were doing To do this the |
I am using the same scope and |
@sguada FYI |
Currently batch_norm requires a python boolean, but we are working in adding the option of passing a Tensor. |
@pawni If you don't want to worry about about updating moving_mean and moving_variance set updates_collections=None to make sure they are updated in place, otherwise you need to make sure the update_ops added to tf.GraphKeys.UPDATE_OPS are run during training. |
I think tensorflow need 2 hyper methods that change the model state, something like torch. change model state. I think it is very straightforward. |
is there a small script with a very simple NN that shows what is the proper way of using this "official" BN layer? I'd really appreciate it. |
sorry if this is a little repetitive, but it seems the API talks about BN in a different interface: https://www.tensorflow.org/versions/r0.9/api_docs/python/nn.html#batch_normalization is that not the official way to use BN? I am confused on how to use it and the SO seems to be outdated and then there is a layer in a different link from the API, just how exactly does one do this? I am unclear if to go to SO or ask here. |
sorry for the spamming, but what is wrong with just using something like this:
then its simple to tell tensorflow which one to use with a feed dictionary as in:
since its unclear if the implementation will change, I wanted to give a suggestion (note its easy to extend to convolutions and stuff I just didn't paste that code). |
@brando90 currently I am doing something like:
However, I think that #3265 would basically want to implement it like this. A reference could be the dropout implementation here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L433-L435 |
When the updates_collections=None then the updates happens in-place and it is easier to use a tf.cond() to allow is_training being a Tensor a bit more complicated is when the updates are delayed and the the update_ops are run later. |
@brando90 @pawni he's code works good, but have to change like below def BatchNorm(inputT, is_training=True, scope=None):
# Note: is_training is tf.placeholder(tf.bool) type
return tf.cond(is_training,
lambda: batch_norm(inputT, is_training=True,
center=False, updates_collections=None, scope=scope),
lambda: batch_norm(inputT, is_training=False,
updates_collections=None, center=False, scope=scope, reuse = True)) And when run in training or test time, # when training
sess.run([opt, loss], feed_dict={x: bx, y: by, is_training=True})
# when test
sess.run([opt, loss], feed_dict={x: bx, y: by, is_training=False}) This code works, but like #3265 says it will be great if |
@nmhkahn @pawni thanks for the code snippets. They were very useful in adding batch normalization to my convolution network. Training seems to work very well. Testing is not. In some versions of the code training accuracies are much higher than testing accuracies, which probably mean I am not sharing batch normalization parameters. In other versions of the code I get "ValueError: Variable conv1/beta already exists, disallowed. Did you mean to set reuse=True in VarScope?" which seem to indicate that I am trying to relearn the parameter... when I was trying to reuse. Can someone provide an example of how to call the "def BatchNorm" function during training and testing so that variable sharing happen correctly. Thanks for any help. UPDATE July 25, 2016: @nmhkahn @pawni thanks for your comments. After taking a closer look at the code in contrib I realized what my problem was. During training and testing we are either updating or reusing four variables (beta, gamma, moving_mean and moving_variance). To make those unique I had to set a scope per layer. I did it like this: conv1 = tf.nn.relu(batch_norm_layer(conv2d_stride2_valid(data, W_conv1) + b_conv1, train_phase, scope="conv1")) where batch_norm_layer is similar to the examples from @nmhkahn @pawni, conv2d_stride2_valid is just a def to define a convolutional layer, and W_conv1 and b_conv1 are variables holding the weights and biases. I could probably remove the bias term because we are using batch normalization. The net is working well now. I noticed after plotting accuracies in training and test mode that the testing accuracies start climbing after the training accuracies. In retrospect it make sense since we are collecting dataset statistics for testing. But it appeared as if I was doing something wrong during my initial tests. Thanks for your comments and making batch normalization available to the community. |
@nmhkahn how is it different from pawni's suggestion? |
@brando90 I had a small error in my version which was fixed by nmhkahn (changing @diegoAtAlpine I found the same problems - not sure why this is the case though. However, the ValueError should be resolved by the code snippet. Not sure what you want to see how to call it as nmhkahn's examples seems to do the job? |
@nmhkahn @pawni @ when you do:
doesn't that mean that your using
is that not correct? |
I have already extended tf.contrib.layers.batch_norm to allow passing a Tensor or a Placeholder for is_training. It will be merged in TF contrib soon. Now available in |
is it just me or does adding this BN layer noticeably slows down training of a single epoch? |
Swivel: add multiple GPU support
@sguada Hi, sguada, I have a problem. If I use tf.contrib.layers.batch_norm(input, scale=False) , the"scale =False" means whether the gamma is zero in "y = gamma*x+beta" while training. Thank you very much. |
When scale=False, gamma is a constant 1. |
@ppwwyyxx Thank you very much for your help. I use tf.contrib.layers.batch_norm(input, scale=False) in Tensorflow, and now I am convering the batchnorm of Tensorflow to Caffe. How to set the param of BatchNormLayer and ScaleLayer in Caffe? |
@MisayaZ I was having the same behavior using Batchnorm with a placeholder for "is_training". I see in the trace that the moments are being calculated even at test time, so I decided to go into the source code and I found this: # If `is_training` doesn't have a constant value, because it is a `Tensor`,
# a `Variable` or `Placeholder` then is_training_value will be None and
# `needs_moments` will be true.
is_training_value = utils.constant_value(is_training)
need_moments = is_training_value is None or is_training_value
if need_moments:
# here it defines the moments It looks like when "is_training" is a variable or a placeholder the moments get defined and also get calculates them at runtime, even when you set the placeholder to "False". I would have preferred to leave it as a placeholder because this way I can do periodic testing during training without redefining the graph, but I decided to use it as a constant and define different behaviors for train vs test, and now the moments are not calculated at test time. |
@tano297 Thank you. I now also use 'is_training' as a constant. Leave it as a placeholder and do periodic testing will change the value of moving mean and moving variance. And the inference time will be longer for it will calculate the mean and variance of the inputs and update the moving mean and moving variance. The right way to do testing is to define different behaviors for train and test as you mentioned. |
@tano297 @MisayaZ
make sure that the updates are only calculated and applied if is_training evaluates to True? |
@abred Yes indeed, but you are referring to line 391, where it does the update of the moving average within _fused_batch_norm(): # If `is_training` doesn't have a constant value, because it is a `Tensor`,
# a `Variable` or `Placeholder` then is_training_value will be None and
# `need_updates` will be true.
is_training_value = utils.constant_value(is_training)
need_updates = is_training_value is None or is_training_value
if need_updates:
...
outputs = utils.smart_cond(is_training, _force_updates, no_updates)
... I am talking about line 753 within batch_norm(): # If `is_training` doesn't have a constant value, because it is a `Tensor`,
# a `Variable` or `Placeholder` then is_training_value will be None and
# `needs_moments` will be true.
is_training_value = utils.constant_value(is_training)
need_moments = is_training_value is None or is_training_value
if need_moments:
...
mean, variance = utils.smart_cond(is_training,
_force_updates,
moving_vars_fn)
... The smart condition in that case (as far as I am concerned) decides wether or not to update the moving averages, but the moments still get calculated. |
@tano297 you right about that, I was in the wrong place, but still:
should be equivalent to line 804:
if is_training evalutes to False and thus the "moments"-part of the graph is never used and thus shouldn't be executed but I haven't tested, so I might be wrong about that :) |
@tano297 @abred you right. The moving mean and moving variance are changed when I used batchnorm like this:
If you use like following:
The moving mean and moving variance will not be changed during test, but the speed is very slow. |
Hi @zhongyuk , I also met the problem that I could get good results when using is_training=True for both training and inference, but get bad results when setting is_training=False during inference (worse than the case using is_training=True). According to your analysis, If I understand correctly, by simply setting decay=0.9 in BN can solve this problem. Am I right? BTW, do I need to retrain the model using decay=0.9 from scratch? Or resuming training from the checkpoint (i.e., trained when decay=0.999) is also ok? Thanks! |
@tyshiwo I just set decay=0.9 for batch_norm and it works well so far. |
I was confused after all these comments on how to properly use Batch Norm: So here is what I have. Please correct me if I'm wrong.
where phase_train_py is a python boolean variable and is_training is a placeholder taking a boolean variable. I guess using tf.cond is wrong, otherwise would did the function came with a boolean parameters. In other words, if
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ It seems there are still problems with TF v1.3. I'm sure I note the following details, but still failed to use the official It seems the only way to use the official batch_norm is to build two graphs, one for train and one for evaluation, with Finally, I write a moving average by myself, and I find it worked! It's as follows(based on code on the web and modified by myself)
Just use the Hope it helps the community. |
This commit solves the bug observed in all previous versions of this code in which validation loss/accuracy are approximately/exactly constant for every epoch and all validation predictions are of the same class. The problem was due to incorrect implementation of batch normalization in two ways. First, tensorflow does not automatically collect the update ops for updating the moving_mean and moving_variance. This is now being done by using slim.learning.create_train_op() instead of native tf.train.Optimizer().minimize() to create the train op. Second, the decay parameter has been decreased from the default 0.999 to 0.95, as with too high of a value batch_norm takes too long to converge on a small dataset. For more information, see tensorflow/tensorflow#1122.
When you use slim.batch_norm,be sure to use "slim.learning.create_train_op" instead of "tf.train.GradientDecentOptimizer(lr).minimize(loss)" or other optimizer. Try it to see if it works! |
@vincentvanhoucke You wrote in another post in this thread:
Do you mean with "slim batch_norm wrapper" the function |
@ZahlGraf I'll happily consider a PR that clarifies the documentation. We've been at this for so long that I no longer have a good sense of what's obvious or not, and would welcome clarifying documentation for someone with a fresh perspective on the topic. |
@vincentvanhoucke |
Please remove the assignee, as this issue is inviting external contributions. Otherwise, remove the |
Please remove the assignee, as this issue is inviting external contributions. Otherwise, remove the |
Closing this bug since the original request to add a batch norm layer has been addressed. Some of the more recent issues with documentation seem to have their own PRs |
Many non-experts are using the following code http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow?answertab=votes#tab-top.
It would be nice to have an official batch norm layer given its importance in training DNNs.
The text was updated successfully, but these errors were encountered: