Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

garder14/byol-tensorflow2 (batch-norm & softmax/cross-entropy) #35

Open
evolu8 opened this issue Apr 6, 2021 · 1 comment
Open

garder14/byol-tensorflow2 (batch-norm & softmax/cross-entropy) #35

evolu8 opened this issue Apr 6, 2021 · 1 comment

Comments

@evolu8
Copy link

evolu8 commented Apr 6, 2021

Running TF 2.4.1 with seeds and envs set I'm getting different results each run for this guy:

https://github.com/garder14/byol-tensorflow2

I currently suspect it's the gradient tape. Not sure how to handle that. Would downgrading TF version help?

Thoughts welcome.

@duncanriach duncanriach changed the title Possible GradientTape issue? garder14/byol-tensorflow2 (batch-norm & softmax/cross-entropy) Apr 12, 2021
@duncanriach
Copy link
Collaborator

duncanriach commented Apr 12, 2021

Sorry for the delay in responding, Phil; I was on vacation.

I have not run this code or got into debugging it. Just from looking at it, I can see a couple of likely sources of nondeterminism:

  1. tf.keras.layers.BatchNormalization is instantiated in five places in models.py. This layer uses fused batch-norm functionality, which is nondeterministic when being used for fine-tuning. I don't know under exactly what circumstances that is exposed by the Keras layer and, since I wasn't aware of this exposure until now, I have yet to documented it.
  2. tf.nn.sparse_softmax_cross_entropy_with_logits is used in linearevaluation.py on the output of the ClassificationHead. This op will introduce nondeterminism, and there is a work-around for it.

Answering your specific questions/comments:

  1. "Would downgrading TF version help?": No, and downgrading is very unlikely to ever help. We're trying hard to avoid regressions regarding determinism.
  2. "I currently suspect it's the gradient tape": Gradient tape just means something in the backprop. Both of the above-mentioned sources would lead to an introduction of noise in the backprop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants