Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneDNN] Fuse group of primitive ops for batch normalization to Fu… #44950

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

yimeisun123
Copy link
Contributor

@yimeisun123 yimeisun123 commented Nov 17, 2020

In legacy models, the batch normalization is performed via a group of individual ops instead of a single FusedBatchNorm op. This PR adds fusion support in optimize_for_inference tool to identify a pattern of batch normalization via group of ops in the graph and replace those with single FusedBatchNorm. In inference use case, a FusedBatchNorm op can be further folded with convolution op to reduce the computation and thus increase the performance.
Example of graph transformation via optimize_for_inference tool with change in this PR is shown below. The original subgraph is part of the DenseNet-169. The performance after applying optimize_for_inference is about 1.8x comparing to original graph when measured on CPU for DenseNet-169.

Original subgraph with group of ops made up to batch normalization:
image

Subgraph after optimize_for_inference tool, which fused individual batch normalization ops to FusedBatchNorm, then followed by folding FusedBatchNorm with convolution ops
image

Another example is from GoogleNet-v3, in this model, the gamma factor is 1, multiplication with gamma is not present comparing to the batch normalization ops in DenseNet-169. The performance gain after applying optimze_for_inference is about 1.6x comparing to original graph when measured on CPU for GoogleNet-v3.

Original subgraph with a group of primitive ops made up to batch normalization when gamma is 1:
image

Subgraph after applying optimize_for_inference tool.
image

@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Nov 17, 2020
@google-cla google-cla bot added the cla: yes label Nov 17, 2020
@gbaned gbaned self-assigned this Nov 18, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Nov 18, 2020
@gbaned gbaned added the comp:mkl MKL related issues label Nov 18, 2020
@gbaned gbaned requested a review from penpornk November 18, 2020 10:35
@gbaned gbaned added the awaiting review Pull request awaiting review label Nov 25, 2020
Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and I'm so sorry for the delay! I appreciate the detailed explanation.

tensorflow/python/tools/optimize_for_inference.py Outdated Show resolved Hide resolved
tensorflow/python/tools/optimize_for_inference_lib.py Outdated Show resolved Hide resolved
tensorflow/python/tools/optimize_for_inference_lib.py Outdated Show resolved Hide resolved
tensorflow/python/tools/optimize_for_inference_test.py Outdated Show resolved Hide resolved
tensorflow/python/tools/optimize_for_inference_test.py Outdated Show resolved Hide resolved
tensorflow/python/tools/optimize_for_inference_test.py Outdated Show resolved Hide resolved
tensorflow/python/tools/optimize_for_inference_lib.py Outdated Show resolved Hide resolved
PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Dec 23, 2020
@penpornk penpornk removed the awaiting review Pull request awaiting review label Dec 23, 2020
@gbaned gbaned requested a review from penpornk January 13, 2021 15:14
@yimeisun123
Copy link
Contributor Author

@penpornk - thanks for your previous comments. I pushed a new commit to:

  1. address your comments;
  2. add check on the rank of constant value's shape in the pattern matching, and only allow 1D for fusion candidate;
  3. for allowing communicative in the pattern, since the breakdown pattern of batch normalization was done by the framework, it is not necessary to add the extra checks to handle input order swapping case. Hope this is ok with you.
    Thanks for reviewing the changes.

@gbaned gbaned added the awaiting review Pull request awaiting review label Jan 27, 2021
@gbaned
Copy link
Contributor

gbaned commented Jul 7, 2021

@penpornk Can you please review this PR ? Thanks!

1 similar comment
@gbaned
Copy link
Contributor

gbaned commented Jul 28, 2021

@penpornk Can you please review this PR ? Thanks!

@yimeisun123
Copy link
Contributor Author

This PR is brought up to the latest baseline, and also updated test case on math op function usage.

@gbaned gbaned requested review from penpornk and removed request for penpornk August 26, 2021 17:02
@gbaned
Copy link
Contributor

gbaned commented Sep 21, 2021

@penpornk Can you please review this PR ? Thanks!

@gbaned
Copy link
Contributor

gbaned commented Oct 6, 2021

@penpornk Can you please review this PR ? Thanks!

5 similar comments
@gbaned
Copy link
Contributor

gbaned commented Oct 28, 2021

@penpornk Can you please review this PR ? Thanks!

@gbaned
Copy link
Contributor

gbaned commented Dec 28, 2021

@penpornk Can you please review this PR ? Thanks!

@gbaned
Copy link
Contributor

gbaned commented Feb 7, 2022

@penpornk Can you please review this PR ? Thanks!

@gbaned
Copy link
Contributor

gbaned commented Mar 29, 2022

@penpornk Can you please review this PR ? Thanks!

@gbaned
Copy link
Contributor

gbaned commented Apr 19, 2022

@penpornk Can you please review this PR ? Thanks!

@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 1, 2024
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label May 1, 2024
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label May 1, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 1, 2024
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels May 1, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 1, 2024
self.assertAllClose(original_result, optimized_result)

@test_util.run_deprecated_v1
def testFuseDecomposedBatchNorm_FormatUnsupportCase(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: FormatUnsupportCase -> FormatUnsupportedCase

This test is failing with:

tensorflow.python.framework.errors_impl.UnimplementedError: The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
	 [[{{node conv_op}}]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fusion is only implemented for NHWC format, and I added the check for the data format for precaution.
Is it ok not to add the mismatch test case for the data format?
If it is still needed, do you know any flag that I can add to skip this test case for CPU device?
( I am testing on CPU, couldn't see this test failure).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's supported with oneDNN, but not without?

Maybe you can condition the test on onednn, or otherwise catch the error and skip if the format itself isn't supported.

@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label May 2, 2024
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label May 2, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 2, 2024
@@ -751,7 +752,10 @@ def testFuseDecomposedBatchNorm_PatternMismatchCase(self):
self.assertAllClose(original_result, optimized_result)

@test_util.run_deprecated_v1
def testFuseDecomposedBatchNorm_FormatUnsupportCase(self):
def testFuseDecomposedBatchNorm_FormatUnsupportedCase(self):
if tf_config.list_physical_devices("CPU"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be true, since there's always at least one CPU device (the host).

Plus, don't you want this to run on CPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is added to check - if the frozen graph has NCHW format, we are not going to fuse it. The data format check is just to make sure we don't handle NCHW case. If with the current TensorFlow, it won't even allow creating conv op with NCHW format for CPU, the code checking is just double guarantee.
Can we Not to add this NCHW data format test case?

self.assertAllClose(original_result, optimized_result)

@test_util.run_deprecated_v1
def testFuseDecomposedBatchNorm_FormatUnsupportCase(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's supported with oneDNN, but not without?

Maybe you can condition the test on onednn, or otherwise catch the error and skip if the format itself isn't supported.

@yimeisun123
Copy link
Contributor Author

#44950 (comment)
This PR support fusion only for NHWC format, it doesn't matter it is CPU or GPU (but GPU usually has NCHW format), nor related to oneDNN.

This data format test case doesn't really add any value, again, can we not to add it?

@cantonios
Copy link
Contributor

#44950 (comment) This PR support fusion only for NHWC format, it doesn't matter it is CPU or GPU (but GPU usually has NCHW format), nor related to oneDNN.

This data format test case doesn't really add any value, again, can we not to add it?

Okay

@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label May 7, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 7, 2024
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label May 7, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 7, 2024
@yimeisun123
Copy link
Contributor Author

@cantonios , I updated the data format test case to just verify the fusion result as expected. I think this should avoid the exception on executing the conv2d NCHW on CPU.
If this still doesn't work, I will remove the test case.
Thanks

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels May 7, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 7, 2024
@penpornk
Copy link
Member

penpornk commented May 8, 2024

A TF test failed when oneDNN custom ops aren't enabled:

[ RUN      ] OptimizeForInferenceTest.testFuseDecomposedBatchNorm_FormatUnsupportedCase
W tensorflow/core/common_runtime/graph_constructor.cc:1595] Importing a graph with a lower producer version 8 into an existing graph with producer version 1855. Shape i
nference will have run different parts of the graph with different producer versions.
I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: UNIMPLEMENTED: The Conv2D op currently only supports the NHWC tensor form
at on the CPU. The op was given the format: NCHW
         [[{{node conv_op}}]]
ERROR:tensorflow:Graph execution error:

Could you please help take a look? Command to reproduce:

bazel test -c opt //tensorflow/python/tools:optimize_for_inference_test --test_env=TF_ENABLE_ONEDNN_OPTS=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:mkl MKL related issues ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
PR Queue
  
Reviewer Requested Changes
Development

Successfully merging this pull request may close these issues.

None yet

7 participants