You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just making a note for future reference that training the UNet2DS model on the GPU with Tensorflow backend results in non-deterministic gradient updates, which results in non-deterministic final results. The final submission are typically within 2% of each other in terms of mean F1 score, but still this adds a confounding factor when trying to compare changes to the architecture or training strategy.
There is a lot of material online about TF's non-determinism. Most of it points to the fact that the underlying CuDNN implementation uses non-deterministic reductions for convolutions (i.e. floating point operations are not necessarily associative). The best, most recent insight I could find was in this pull-request, with comments indicating there is supposedly a forthcoming fix to address this issue.
The text was updated successfully, but these errors were encountered:
This also seems to make a non-trivial difference when training UNet1D. It seems most of the new libraries now are using CuDNN, so I'm not sure there's a way around this without some fix in CuDNN.
I have the same issue now with U-Net for segmentation making dice coef different (+3) every run with the same seed. Were you able to find a solution for this?
Just making a note for future reference that training the UNet2DS model on the GPU with Tensorflow backend results in non-deterministic gradient updates, which results in non-deterministic final results. The final submission are typically within 2% of each other in terms of mean F1 score, but still this adds a confounding factor when trying to compare changes to the architecture or training strategy.
There is a lot of material online about TF's non-determinism. Most of it points to the fact that the underlying CuDNN implementation uses non-deterministic reductions for convolutions (i.e. floating point operations are not necessarily associative). The best, most recent insight I could find was in this pull-request, with comments indicating there is supposedly a forthcoming fix to address this issue.
The text was updated successfully, but these errors were encountered: