[GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. #12520

sergachev · 2024-05-15T14:52:07Z

It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled.

--xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too.

Imported from GitHub PR openxla/xla#12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- f47f18016777468fe274bea00945d5209a2cdb57 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12520 from openxla:new_determinism_flag f47f18016777468fe274bea00945d5209a2cdb57 PiperOrigin-RevId: 634336994

dimitar-asenov · 2024-05-16T13:53:14Z

This breaks one of TensorFlow's deterministic ops tests. I'm looking into it.

dimitar-asenov · 2024-05-16T18:25:29Z

@sergachev Could you please rebase this PR, as it has conflicting changes at the moment.

sergachev · 2024-05-16T20:32:22Z

Done

akuegel

Approving the rebase

Imported from GitHub PR openxla/xla#12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- 4e2837457dc426154bf80f321c001c916a7d3677 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12520 from openxla:new_determinism_flag 4e2837457dc426154bf80f321c001c916a7d3677 PiperOrigin-RevId: 634336994

dimitar-asenov · 2024-05-17T11:46:17Z

This change fails this OSS test (running on Linux): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/nn_ops/cudnn_deterministic_ops_test.py

The failure is miscomparison in numerics, I guess something in the way the determinism is being controlled doesn't work out.

Interestingly, running the same test in our internal build environment passes. I'll have a look at this, but if you have any ideas in the meantime, please let me know.

sergachev · 2024-05-17T12:07:13Z

Maybe I know what the problem is - after reading it again I think this line

xla/xla/service/gpu/stream_executor_util.cc

Line 627 in b17918d

config.debug_options().xla_gpu_deterministic_ops();

should stay unchanged (it's expected that RequireDeterminism() here disables autotuning).

sergachev · 2024-05-17T12:08:57Z

Removed that change and updated the comments.

Imported from GitHub PR #12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- 4e28374 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 FUTURE_COPYBARA_INTEGRATE_REVIEW=#12520 from openxla:new_determinism_flag 4e28374 PiperOrigin-RevId: 634721436

dimitar-asenov · 2024-05-17T12:35:46Z

Thanks. I'm not sure if that was the issue. In any case there is another which I'm fixing internally:

Simply using the special setter setter_for_xla_gpu_deterministic_ops to set the new flag whenever --xla_gpu_deterministic_ops is also set, doesn't work. The reason is that it's possible to manually construct debug options without calling into this logic at all. So the correct implementation is to change to code everywhere where needed to check both flags.

Imported from GitHub PR #12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- 4e28374 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 FUTURE_COPYBARA_INTEGRATE_REVIEW=#12520 from openxla:new_determinism_flag 4e28374 PiperOrigin-RevId: 634721436

Imported from GitHub PR openxla/xla#12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- 68b555ff66f299423a8de6aef595cf38f621976f by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12520 from openxla:new_determinism_flag 68b555ff66f299423a8de6aef595cf38f621976f PiperOrigin-RevId: 634336994

Imported from GitHub PR openxla/xla#12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- 4e2837457dc426154bf80f321c001c916a7d3677 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12520 from openxla:new_determinism_flag 4e2837457dc426154bf80f321c001c916a7d3677 PiperOrigin-RevId: 634721436

Imported from GitHub PR openxla/xla#12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- 68b555ff66f299423a8de6aef595cf38f621976f by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12520 from openxla:new_determinism_flag 68b555ff66f299423a8de6aef595cf38f621976f PiperOrigin-RevId: 634336994

Imported from GitHub PR openxla/xla#12520 It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true too. Copybara import of the project: -- 4e2837457dc426154bf80f321c001c916a7d3677 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. Merging this change closes #12520 PiperOrigin-RevId: 634756524

github-actions bot added the kokoro:force-run Forces CI to rerun label May 15, 2024

github-actions bot assigned kamaljeeti and xla-rotation May 15, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label May 15, 2024

dimitar-asenov requested a review from akuegel May 15, 2024 19:01

akuegel approved these changes May 16, 2024

View reviewed changes

copybara-service bot mentioned this pull request May 16, 2024

PR #12520: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. tensorflow/tensorflow#67735

Closed

sergachev force-pushed the new_determinism_flag branch from f47f180 to 4e28374 Compare May 16, 2024 20:31

github-actions bot added the kokoro:force-run Forces CI to rerun label May 16, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label May 16, 2024

akuegel approved these changes May 17, 2024

View reviewed changes

sergachev force-pushed the new_determinism_flag branch from 4e28374 to 9a1688f Compare May 17, 2024 12:05

github-actions bot added the kokoro:force-run Forces CI to rerun label May 17, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label May 17, 2024

[GPU] Add new flag xla_gpu_exclude_nondeterministic_ops.

68b555f

sergachev force-pushed the new_determinism_flag branch from 9a1688f to 68b555f Compare May 17, 2024 12:08

github-actions bot added the kokoro:force-run Forces CI to rerun label May 17, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label May 17, 2024

copybara-service bot mentioned this pull request May 17, 2024

PR #12520: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. #12631

Closed

copybara-service bot mentioned this pull request May 17, 2024

PR #12520: [GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. tensorflow/tensorflow#67839

Merged

copybara-service bot closed this in 8875535 May 17, 2024

sergachev deleted the new_determinism_flag branch May 23, 2024 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. #12520

[GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. #12520

sergachev commented May 15, 2024

dimitar-asenov commented May 16, 2024

dimitar-asenov commented May 16, 2024

sergachev commented May 16, 2024

akuegel left a comment

dimitar-asenov commented May 17, 2024 •

edited

sergachev commented May 17, 2024

sergachev commented May 17, 2024

dimitar-asenov commented May 17, 2024

[GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. #12520

[GPU] Add new flag xla_gpu_exclude_nondeterministic_ops. #12520

Conversation

sergachev commented May 15, 2024

dimitar-asenov commented May 16, 2024

dimitar-asenov commented May 16, 2024

sergachev commented May 16, 2024

akuegel left a comment

Choose a reason for hiding this comment

dimitar-asenov commented May 17, 2024 • edited

sergachev commented May 17, 2024

sergachev commented May 17, 2024

dimitar-asenov commented May 17, 2024

dimitar-asenov commented May 17, 2024 •

edited