Lack of reproducibility when using Huggingface transformers library (TensorFlow version) #14

dmitriydligach · 2020-04-28T18:26:29Z

Dear developers,

I included in my code all the steps listed in this repository but still could not achieve reproducibility either using TF 2.1 or TF 2.0. Here's the link to my code:

https://github.com/dmitriydligach/Thyme/blob/master/Keras/et.py

Please help.

MFreidank · 2020-06-15T12:59:32Z

@dmitriydligach Did you ever get this resolved?

dmitriydligach · 2020-06-16T16:52:38Z

@MFreidank Nope. I switched to PyTorch, which has a more reliable way to enforce determinism.

MFreidank · 2020-06-16T17:13:16Z

@dmitriydligach Just to verify: your code becomes fully reproducible with pytorch?

duncanriach · 2020-06-16T17:23:58Z

PyTorch has potentially different non-deterministic ops than TensorFlow, and no general mechanism, yet, to enable deterministic op functionality. Both PyTorch and TensorFlow now have the ability to enable deterministic cuDNN functionality.

This code may use an op that happens to be non-deterministic in TensorFlow but deterministic in PyTorch.

I'm hoping to look at this code in detail soon, hopefully today.

dmitriydligach · 2020-06-16T17:40:34Z

@MFreidank In most cases, I get the exact same results every time I run my PyTorch code (including loss and accuracy for each epoch). In some (relatively infrequent) cases, there's still a difference, but it's not nearly as large as in the case of tensorflow.

MFreidank · 2020-06-16T17:48:01Z

PyTorch has potentially different non-deterministic ops than TensorFlow, and no general mechanism, yet, to enable deterministic op functionality. Both PyTorch and TensorFlow now have the ability to enable deterministic cuDNN functionality.

This code may use an op that happens to be non-deterministic in TensorFlow but deterministic in PyTorch.

I'm hoping to look at this code in detail soon, hopefully today.

@duncanriach Thanks for your blazingly fast response! :)
I would still have an interest in resolving this issue in TF 2.2 and would highly appreciate it if you could help investigate.

A helpful starting point could be my colab example.

@dmitriydligach Thanks for those additional details, that sounds like there is still a slight non-determinism in pytorch as well, but it might not affect loss/accuracy as strongly. This is valuable information for me, thank you for sharing your experience :)

duncanriach · 2020-06-16T17:58:54Z

@dmitriydligach: I'm sorry that I didn't get to sorting this out for you in time to benefit from determinism in TensorFlow.

@MFreidank: I'll prioritize taking a look at these issues. They could have the same underlying cause, or source, or there could be different sources. Often in these kinds of problems there is an issue with setup that is easy to resolve. I intend to add better step-by-step instructions to the README for that. Sometimes a known (and not-yet-fixed) non-deterministic op is being used, and sometimes there is a new discovery, an op that is non-deterministic that we didn't know about about. We'll figure this out.

MFreidank · 2020-06-16T18:15:24Z

@duncanriach Thanks a lot for taking the time to look into this and for your encouragement.
I feel much more confident about this now, knowing that someone with your experience will be having a look.

duncanriach · 2020-06-17T03:07:31Z

Hey @dmitriydligach, it looks like we have reproducibility in on issue 19 (Huggingface Transformers BERT for TensorFlow). @MFreidank is confirming. Looking at your code, I don't see any reason for there to be non-determinism. I want to repro what you're seeing so that I can debug it. I have it running, but it looks like I have to specify DATA_ROOT and provide data there. Can you give me instructions to repro with the data you're using?

MFreidank · 2020-06-17T10:36:47Z

@duncanriach Non-reproducibility of the code of @dmitriydligach may be related to him training for multiple epochs, see my update on issue #19.

dmitriydligach · 2020-06-17T20:38:41Z

@duncanriach Thank you very much for looking into this issue.

Unfortunately, I'm not able to provide the data (this is medical data that can only be distributed via a data use agreement). However, perhaps it would help you to know that the data consists of relatively short text fragments (max_len ~ 150 word pieces)...

MFreidank mentioned this issue Jun 16, 2020

Reproducibility issue with transformers (BERT) and tf2.2 #19

Open

duncanriach added debugging waiting for code labels Jun 17, 2020

duncanriach removed the waiting for code label Jun 17, 2020

MFreidank mentioned this issue Jun 22, 2020

[WIP] Feature/patch/softmax cross entropy with logits #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of reproducibility when using Huggingface transformers library (TensorFlow version) #14

Lack of reproducibility when using Huggingface transformers library (TensorFlow version) #14

dmitriydligach commented Apr 28, 2020 •

edited

MFreidank commented Jun 15, 2020

dmitriydligach commented Jun 16, 2020

MFreidank commented Jun 16, 2020

duncanriach commented Jun 16, 2020 •

edited

dmitriydligach commented Jun 16, 2020

MFreidank commented Jun 16, 2020

duncanriach commented Jun 16, 2020

MFreidank commented Jun 16, 2020 •

edited

duncanriach commented Jun 17, 2020

MFreidank commented Jun 17, 2020

dmitriydligach commented Jun 17, 2020

Lack of reproducibility when using Huggingface transformers library (TensorFlow version) #14

Lack of reproducibility when using Huggingface transformers library (TensorFlow version) #14

Comments

dmitriydligach commented Apr 28, 2020 • edited

MFreidank commented Jun 15, 2020

dmitriydligach commented Jun 16, 2020

MFreidank commented Jun 16, 2020

duncanriach commented Jun 16, 2020 • edited

dmitriydligach commented Jun 16, 2020

MFreidank commented Jun 16, 2020

duncanriach commented Jun 16, 2020

MFreidank commented Jun 16, 2020 • edited

duncanriach commented Jun 17, 2020

MFreidank commented Jun 17, 2020

dmitriydligach commented Jun 17, 2020

dmitriydligach commented Apr 28, 2020 •

edited

duncanriach commented Jun 16, 2020 •

edited

MFreidank commented Jun 16, 2020 •

edited