New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[oneDNN] Fuse group of primitive ops for batch normalization to Fu… #44950

Open

yimeisun123 wants to merge 16 commits into tensorflow:master from Intel-tensorflow:yimei/fuse_old_bn

+531 −0

Contributor

yimeisun123 commented Nov 17, 2020 •

edited

In legacy models, the batch normalization is performed via a group of individual ops instead of a single FusedBatchNorm op. This PR adds fusion support in optimize_for_inference tool to identify a pattern of batch normalization via group of ops in the graph and replace those with single FusedBatchNorm. In inference use case, a FusedBatchNorm op can be further folded with convolution op to reduce the computation and thus increase the performance.
Example of graph transformation via optimize_for_inference tool with change in this PR is shown below. The original subgraph is part of the DenseNet-169. The performance after applying optimize_for_inference is about 1.8x comparing to original graph when measured on CPU for DenseNet-169.

Original subgraph with group of ops made up to batch normalization:

Subgraph after optimize_for_inference tool, which fused individual batch normalization ops to FusedBatchNorm, then followed by folding FusedBatchNorm with convolution ops

Another example is from GoogleNet-v3, in this model, the gamma factor is 1, multiplication with gamma is not present comparing to the batch normalization ops in DenseNet-169. The performance gain after applying optimze_for_inference is about 1.6x comparing to original graph when measured on CPU for GoogleNet-v3.

Original subgraph with a group of primitive ops made up to batch normalization when gamma is 1:

Subgraph after applying optimize_for_inference tool.


          [Intel MKL] Fuse group of primitive ops for batch normalization to Fu…

a5c7c67

…sedBatchNorm op

google-ml-butler bot added the size:M label

google-cla bot added the cla: yes label

gbaned self-assigned this

gbaned added this to Assigned Reviewer in PR Queue via automation

gbaned added the comp:mkl label

gbaned requested a review from penpornk

November 18, 2020 10:35

gbaned added the awaiting review label


          Add support for decomposed batchnorm pattern when gamma factor is 1 a…

cbbac5a

…nd scale mul op skipped

penpornk requested changes

View reviewed changes

Member

penpornk left a comment

Thank you for the PR and I'm so sorry for the delay! I appreciate the detailed explanation.

tensorflow/python/tools/optimize_for_inference.py Outdated Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_lib.py Outdated Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_lib.py Outdated Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_lib.py Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_lib.py Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_test.py Outdated Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_test.py Outdated Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_test.py Outdated Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_lib.py Outdated Show resolved Hide resolved

tensorflow/python/tools/optimize_for_inference_lib.py Show resolved Hide resolved

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes

penpornk removed the awaiting review label


          Address review comments and add check on rank of Constant values in p…

2d91ea7

…attern matching

gbaned requested a review from penpornk

January 13, 2021 15:14

Contributor Author

yimeisun123 commented Jan 13, 2021

@penpornk - thanks for your previous comments. I pushed a new commit to:

address your comments;
add check on the rank of constant value's shape in the pattern matching, and only allow 1D for fusion candidate;
for allowing communicative in the pattern, since the breakdown pattern of batch normalization was done by the framework, it is not necessary to add the extra checks to handle input order swapping case. Hope this is ok with you.
Thanks for reviewing the changes.

gbaned added the awaiting review label

Contributor

gbaned commented Jul 7, 2021

@penpornk Can you please review this PR ? Thanks!

1 similar comment

Contributor

gbaned commented Jul 28, 2021

@penpornk Can you please review this PR ? Thanks!

yimeisun123 added 2 commits

August 17, 2021 09:36


          Merge branch 'master' into yimei/fuse_old_bn

fc73d27


          Update test case usage of math op API

3b98462

Contributor Author

yimeisun123 commented Aug 18, 2021

This PR is brought up to the latest baseline, and also updated test case on math op function usage.

gbaned requested review from penpornk and removed request for penpornk

August 26, 2021 17:02

Contributor

gbaned commented Sep 21, 2021

@penpornk Can you please review this PR ? Thanks!

Contributor

gbaned commented Oct 6, 2021

@penpornk Can you please review this PR ? Thanks!

5 similar comments

Contributor

gbaned commented Oct 28, 2021

@penpornk Can you please review this PR ? Thanks!

Contributor

gbaned commented Dec 28, 2021

@penpornk Can you please review this PR ? Thanks!

Contributor

gbaned commented Feb 7, 2022

@penpornk Can you please review this PR ? Thanks!

Contributor

gbaned commented Mar 29, 2022

@penpornk Can you please review this PR ? Thanks!

Contributor

gbaned commented Apr 19, 2022

@penpornk Can you please review this PR ? Thanks!

kokoro-team removed the kokoro:force-run label

cantonios requested changes

View reviewed changes

tensorflow/python/tools/optimize_for_inference_test.py Show resolved Hide resolved


          Update build file

7b94218

google-ml-butler bot removed the ready to pull label

github-actions bot added the kokoro:force-run label

kokoro-team removed the kokoro:force-run label

cantonios approved these changes

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull labels

kokoro-team removed the kokoro:force-run label

cantonios requested changes

View reviewed changes

tensorflow/python/tools/optimize_for_inference_test.py Outdated

+                  self.assertAllClose(original_result, optimized_result)
+                @test_util.run_deprecated_v1
+                def testFuseDecomposedBatchNorm_FormatUnsupportCase(self):

Contributor

cantonios May 1, 2024

nit: FormatUnsupportCase -> FormatUnsupportedCase

This test is failing with:

tensorflow.python.framework.errors_impl.UnimplementedError: The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
	 [[{{node conv_op}}]]

Contributor Author

yimeisun123 May 1, 2024

The fusion is only implemented for NHWC format, and I added the check for the data format for precaution.
Is it ok not to add the mismatch test case for the data format?
If it is still needed, do you know any flag that I can add to skip this test case for CPU device?
( I am testing on CPU, couldn't see this test failure).

Contributor

cantonios May 2, 2024

Maybe it's supported with oneDNN, but not without?

Maybe you can condition the test on onednn, or otherwise catch the error and skip if the format itself isn't supported.


          Update unit test for data format check

f176338

google-ml-butler bot removed the ready to pull label

github-actions bot added the kokoro:force-run label

kokoro-team removed the kokoro:force-run label

cantonios requested changes

View reviewed changes

tensorflow/python/tools/optimize_for_inference_test.py Outdated

@@ @@ -751,7 +752,10 @@ def testFuseDecomposedBatchNorm_PatternMismatchCase(self): @@
                   self.assertAllClose(original_result, optimized_result)
                 @test_util.run_deprecated_v1
-                def testFuseDecomposedBatchNorm_FormatUnsupportCase(self):
+                def testFuseDecomposedBatchNorm_FormatUnsupportedCase(self):
+                  if tf_config.list_physical_devices("CPU"):

Contributor

cantonios May 2, 2024

This will always be true, since there's always at least one CPU device (the host).

Plus, don't you want this to run on CPU?

Contributor Author

yimeisun123 May 3, 2024

This test case is added to check - if the frozen graph has NCHW format, we are not going to fuse it. The data format check is just to make sure we don't handle NCHW case. If with the current TensorFlow, it won't even allow creating conv op with NCHW format for CPU, the code checking is just double guarantee.
Can we Not to add this NCHW data format test case?

tensorflow/python/tools/optimize_for_inference_test.py Outdated

+                  self.assertAllClose(original_result, optimized_result)
+                @test_util.run_deprecated_v1
+                def testFuseDecomposedBatchNorm_FormatUnsupportCase(self):

Contributor

cantonios May 2, 2024

Maybe it's supported with oneDNN, but not without?

Maybe you can condition the test on onednn, or otherwise catch the error and skip if the format itself isn't supported.

Contributor Author

yimeisun123 commented May 3, 2024

#44950 (comment)
This PR support fusion only for NHWC format, it doesn't matter it is CPU or GPU (but GPU usually has NCHW format), nor related to oneDNN.

This data format test case doesn't really add any value, again, can we not to add it?

Contributor

cantonios commented May 3, 2024

#44950 (comment) This PR support fusion only for NHWC format, it doesn't matter it is CPU or GPU (but GPU usually has NCHW format), nor related to oneDNN.

This data format test case doesn't really add any value, again, can we not to add it?

Okay


          Update data format test case to check fusion result only

78d5e9e

github-actions bot added the kokoro:force-run label


          Merge branch 'master' into yimei/fuse_old_bn

9b2bf12

kokoro-team removed the kokoro:force-run label

github-actions bot added the kokoro:force-run label

kokoro-team removed the kokoro:force-run label

Contributor Author

yimeisun123 commented May 7, 2024

@cantonios , I updated the data format test case to just verify the fusion result as expected. I think this should avoid the exception on executing the conv2d NCHW on CPU.
If this still doesn't work, I will remove the test case.
Thanks

cantonios approved these changes

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull labels

kokoro-team removed the kokoro:force-run label

Member

penpornk commented May 8, 2024

A TF test failed when oneDNN custom ops aren't enabled:

[ RUN      ] OptimizeForInferenceTest.testFuseDecomposedBatchNorm_FormatUnsupportedCase
W tensorflow/core/common_runtime/graph_constructor.cc:1595] Importing a graph with a lower producer version 8 into an existing graph with producer version 1855. Shape i
nference will have run different parts of the graph with different producer versions.
I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: UNIMPLEMENTED: The Conv2D op currently only supports the NHWC tensor form
at on the CPU. The op was given the format: NCHW
         [[{{node conv_op}}]]
ERROR:tensorflow:Graph execution error:

Could you please help take a look? Command to reproduce:

bazel test -c opt //tensorflow/python/tools:optimize_for_inference_test --test_env=TF_ENABLE_ONEDNN_OPTS=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment