[WIP] Feature/patch/softmax cross entropy with logits #21

MFreidank · 2020-06-22T08:07:25Z

Addressing previously reported tensorflow issue #38185. Also tied to tfdeterminism issues #19, #14, #9.
Fused implementation of {sparse_,}softmax_cross_entropy_with_logits exhibits non-determinism in its backprop.
This PR will provide a patch that maintains the same function interface but computes the computational steps involved in a non-fused way that is fully deterministic.

Tied to open tensorflow issue #38185. Non-determinism in backprop of fused implementation of `{sparse_,}softmax_cross_entropy_with_logits` has been reported. This work will provide a patch by routing calls to a deterministic workaround first described in the tensorflow issue above.

duncanriach · 2020-06-22T22:02:57Z

tfdeterminism/patch.py

@@ -70,13 +70,20 @@ def _patch():
  if re.match("(1\.(14|15)|2\.0)", tf_version):
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    _patch_bias_add()
+    _patch_fused_softmax_cross_entropy()


I'll push changes that rough-out enable_determinism and then you can call this from the appropriate part of that.

duncanriach · 2020-06-22T22:04:00Z

tfdeterminism/patch.py

+    same as `logits` and its shape is the same as `labels` except that it does
+    not have the last dimension of `labels`.
+  """
+  raise NotImplementedError()


I like the baby steps. Good job!

duncanriach · 2020-06-22T22:12:03Z

Comments above. Heads-up, before you're done, please remember to implement test/test_fused_softmax_cross_entropy.py in the spirit of test/test_patch_bias_add.py. This will use the appropriate forward code from the upstream TensorFlow repo along with the deterministic gradient test code that I provided in TensorFlow Issue 38185 (also testing all the modified op bindings).

I'm going to work on adding enable_determinism, from which _patch_fused_softmax_cross_entropy can be called.

duncanriach · 2020-06-22T22:40:45Z

tfdeterminism/patch.py

+  tf.nn.softmax_cross_entropy_with_logits = _new_softmax_cross_entropy_with_logits_1_14  # access via public API
+  nn.softmax_cross_entropy_with_logits = _new_softmax_cross_entropy_with_logits_1_14  # called from tf.keras.layers.convolutional.Conv
+  nn_ops.softmax_cross_entropy_with_logits = _new_softmax_cross_entropy_with_logits_1_14  # called from tests


Looking at tensorflow/python/ops/nn_ops.py it seems like it might be necessary to remap the bindings for softmax_cross_entropy_with_logits_v2 as well. After a brief review, I'm not sure what the difference between this and the other versions is, though, apart from the deprecated dim parameter.

duncanriach · 2020-06-22T22:48:52Z

tfdeterminism/patch.py

+
+# The original, pre-patched method can be viewed at
+# https://github.com/tensorflow/tensorflow/blob/v1.14.0/tensorflow/python/ops/nn_ops.py#L3182
+def _new_softmax_cross_entropy_with_logits_1_14(labels, logits, axis=-1, name=None):


BTW, the reason that _new_bias_add_1_14 has the version number in the name is that before stock TensorFlow version 1.14, tf.nn.bias_add has a different API. I have a version of this patch for TensorFlow version 1.13 and earlier that I didn't release because it was not useful without TF_CUDNN_DETERMINISTIC, which was released in version 1.14 of stock TensorFlow. The deterministic bias_add was effectively released in the NVIDIA GPU Cloud (NGC) TensorFlow container version 19.06, which also implemented TF_DETERMINISTIC_OPS, and which enabled the functionality.

I don't think we need the _1_14 on the end of these method names. However, before enabling this patch to be applied to any given version of TensorFlow (as decided in enable_determinism) it should be confirmed that the patched version of the method has the same API as the original on in that version of TensorFlow.

duncanriach · 2020-06-22T23:23:03Z

tfdeterminism/patch.py

+
+  # Non-sparse
+  tf.nn.sparse_softmax_cross_entropy_with_logits = _new_sparse_softmax_cross_entropy_with_logits_1_14  # access via public API
+  nn.sparse_softmax_cross_entropy_with_logits = _new_sparse_softmax_cross_entropy_with_logits_1_14  # called from tf.keras.layers.convolutional.Conv


The comment on the end of this line is not relevant here (same for the non-sparse version). You should search for calls to sparse_softmax_cross_entropy_with_logits (and the non-sparse version) in tensorflow/python and do the re-binding as appropriate for the op (with similar, perhaps in some cases identical, comments about why you're doing it).

duncanriach · 2020-06-23T05:32:40Z

I just added enable_determinism with this commit. You should use enable_determinism to apply the patch in test/test_patch_fused_softmax_cross_entropy.py.

I'm going reorganize the testing quite a lot before release. For the testing part, I would like you to focus on getting the following complete and working:

$ cd test
$ echo "Test TF1 API:"
$ ./container.sh tensorflow/tensorflow:1.14.0-gpu python test_patch_fused_softmax_cross_entropy.py
$ echo "Test TF2 API:"
$ ./container.sh tensorflow/tensorflow:2.2.0-gpu python test_patch_fused_softmax_cross_entropy.py

…ps://github.com/MFreidank/tensorflow-determinism into feature/patch/softmax_cross_entropy_with_logits

duncanriach · 2020-06-24T03:30:14Z

tfdeterminism/enable_determinism.py

@@ -52,7 +52,7 @@ def _enable_determinism(seed=None):
    _patch_bias_add()
  if in_ngc_cont and ngc_vers.at_least('19.06') or tf_vers.at_least('2.1'):
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
+    # TODO: Add patch crossentropy here as well? Issue seems to still be present on tf 2.1, 2.2


No. This is the condition for setting TF_DETERMINISTIC_OPS=1: NGC containers with version >= 19.06 or stock TensorFlow with version >= 2.1.

The condition below will ensure that the fused softmax/cross-entropy patch is applied to NGC containers with version >= 19.06 or stock TensorFlow with version >= 1.14 (which includes versions 2.1 and 2.2).

duncanriach · 2020-06-24T03:33:13Z

tfdeterminism/patch.py

+    print("TensorFlow version %s has been patched "
+          "using tfdeterminism version %s" %
+          (tf_version, __version__), file=sys.stderr)
+  elif re.match("2\.1|2\.2", tf_version):


I want to deprecate patch, not make it work on newer versions of TensorFlow, and no longer advertise it in the documentation (will be updated before release). If folks want determinism in TensorFlow version 2.1 or 2.2 then they should be using enable_determinism. Not adding this does not break anything because there was no patch available for TensorFlow versions 2.1 and 2.2 before.

phqtuyen · 2020-08-05T07:47:51Z

Hi, my name is Thomas, I am working at the Oracle Digital Assistant team in which we made extensively use of Tensorflow and Tensorflow-determinism, we really appreciate this wonderful library. I just wish to ask is this issue going to be merged anytime soon?

duncanriach · 2020-08-07T01:48:11Z

Hi Thomas (@phqtuyen),

Thanks for letting us know. These ops seem to be generally high priority for determinism in TensorFlow.

@MFreidank: please can you give an update on this and/or move it forward. Most of the rest of the infrastructure is now in place for this.

MFreidank · 2020-08-11T08:14:26Z

Hi @phqtuyen, @duncanriach
Sorry for the wait, I had been pulled into other things.
I've picked up work on this now and will have an update to the PR before end of this week.
I'll base my changes on the new infrastructure recently put in place.

duncanriach · 2020-08-11T16:28:56Z

Awesome. Just FYI, I'll be offline next week (2020-08-17 through 2020-08-21) and, since I'll need to be involved in reviews and iterations, that might slow things down a bit.

duncanriach · 2020-09-02T01:23:25Z

Hey @MFreidank, are you still planning on completing this PR?

duncanriach · 2020-09-04T18:07:01Z

Hi @MFreidank, I'm about to go offline again until 2020-09-16, but I wanted to check-in with you again. I think that this op's nondeterminism is near the top of the priority/urgency list for TensorFlow, and I want to get it resolved ASAP. Can you commit to pushing this through from your end?

MFreidank · 2020-09-04T18:14:08Z

Hi Duncan, Sorry for the delay on this. Yes, I'll pick it up starting tomorrow my time. Duncan Riach <notifications@github.com> schrieb am Fr., 4. Sep. 2020, 20:07:

…

Hi @Freidank, I'm about to go offline again until 2020-09-16, but I wanted to check-in with you again. I think that this op's nondeterminism is near the top of the priority/urgency list for TensorFlow, and I want to get it resolved ASAP. Can you commit to pushing this through from your end? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABQSWKBQYI2MLT4QV42MP6DSEEUFJANCNFSM4OEKR34Q> .

duncanriach · 2020-09-04T18:15:43Z

Thank you, Moritz.

MFreidank · 2020-09-05T10:32:08Z

Hi again @duncanriach,
I've implemented a first version (on top of the most recent master branch with enable_determinism) that patches only tf.nn.sparse_softmax_cross_entropy_with_logits and tf.nn.softmax_cross_entropy_with_logits and integrated the unit test case you shared into test/test_softmax_cross_entropy_with_logits.py.
To test that this works as intended I'll need access to my GPU which will only be available to me in another 6 hours.
If the test cases pass, I'll update the PR with the above.
What remains to be determined is which other functions need to be patched after that (by scanning through tensorflow/python as outlined by you above).
I will first focus on getting the minimal patch for the functions in tf.nn to work including unit tests and will then broaden it to include the other functions.

duncanriach · 2020-09-05T18:41:42Z

Thanks, @MFreidank. Sounds good. Just to reiterate, I'll be offline until 2020-09-16 and I'll review this after that.

duncanriach · 2020-09-22T00:55:03Z

Hey @MFreidank, please will you submit those changes so that I can review them?

duncanriach · 2020-10-29T01:36:58Z

@MFreidank, we will soon release version 0.4.0. We intend to include the fused softmax/cross-entropy patch in version 0.5.0 and I want to get started on it ASAP. If I don't hear from you in the 24 hours, we will proceed to implement it without you.

MFreidank added 2 commits June 22, 2020 09:56

fix: line no for link to previous implementation

aeb70ac

MFreidank mentioned this pull request Jun 22, 2020

Add GPU-deterministic back-prop for fused softmax/cross-entropy ops tensorflow/tensorflow#38185

Closed

MFreidank added 2 commits June 22, 2020 10:09

make this patch apply to tf 2.1 and 2.2 as well

bcc8475

fix: ammend mistake in re.match(...) statement

f7d6b86

duncanriach reviewed Jun 22, 2020

View reviewed changes

MFreidank added 4 commits June 23, 2020 14:58

twd: softmax entropy patch in enable_determinism

75ce3db

make this patch apply to tf 2.1 and 2.2 as well

231c7c5

fix: ammend mistake in re.match(...) statement

f22bcc9

Merge branch 'feature/patch/softmax_cross_entropy_with_logits' of htt…

24b4f57

…ps://github.com/MFreidank/tensorflow-determinism into feature/patch/softmax_cross_entropy_with_logits

duncanriach reviewed Jun 24, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Feature/patch/softmax cross entropy with logits #21

[WIP] Feature/patch/softmax cross entropy with logits #21

MFreidank commented Jun 22, 2020

duncanriach Jun 22, 2020

duncanriach Jun 22, 2020

duncanriach commented Jun 22, 2020 •

edited

duncanriach Jun 22, 2020

duncanriach Jun 22, 2020 •

edited

duncanriach Jun 22, 2020

duncanriach commented Jun 23, 2020 •

edited

duncanriach Jun 24, 2020

duncanriach Jun 24, 2020 •

edited

phqtuyen commented Aug 5, 2020

duncanriach commented Aug 7, 2020

MFreidank commented Aug 11, 2020

duncanriach commented Aug 11, 2020

duncanriach commented Sep 2, 2020 •

edited

duncanriach commented Sep 4, 2020 •

edited

MFreidank commented Sep 4, 2020 via email

duncanriach commented Sep 4, 2020

MFreidank commented Sep 5, 2020 •

edited

duncanriach commented Sep 5, 2020

duncanriach commented Sep 22, 2020

duncanriach commented Oct 29, 2020

[WIP] Feature/patch/softmax cross entropy with logits #21

Are you sure you want to change the base?

[WIP] Feature/patch/softmax cross entropy with logits #21

Conversation

MFreidank commented Jun 22, 2020

duncanriach Jun 22, 2020

Choose a reason for hiding this comment

duncanriach Jun 22, 2020

Choose a reason for hiding this comment

duncanriach commented Jun 22, 2020 • edited

duncanriach Jun 22, 2020

Choose a reason for hiding this comment

duncanriach Jun 22, 2020 • edited

Choose a reason for hiding this comment

duncanriach Jun 22, 2020

Choose a reason for hiding this comment

duncanriach commented Jun 23, 2020 • edited

duncanriach Jun 24, 2020

Choose a reason for hiding this comment

duncanriach Jun 24, 2020 • edited

Choose a reason for hiding this comment

phqtuyen commented Aug 5, 2020

duncanriach commented Aug 7, 2020

MFreidank commented Aug 11, 2020

duncanriach commented Aug 11, 2020

duncanriach commented Sep 2, 2020 • edited

duncanriach commented Sep 4, 2020 • edited

MFreidank commented Sep 4, 2020 via email

duncanriach commented Sep 4, 2020

MFreidank commented Sep 5, 2020 • edited

duncanriach commented Sep 5, 2020

duncanriach commented Sep 22, 2020

duncanriach commented Oct 29, 2020

duncanriach commented Jun 22, 2020 •

edited

duncanriach Jun 22, 2020 •

edited

duncanriach commented Jun 23, 2020 •

edited

duncanriach Jun 24, 2020 •

edited

duncanriach commented Sep 2, 2020 •

edited

duncanriach commented Sep 4, 2020 •

edited

MFreidank commented Sep 5, 2020 •

edited