New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow binary seems compiled to use SIMD instructions like AVX2 and FMA, but actually not? #13500
Comments
@MartinZZZ This is slightly confusing. Note that the info log The reason I know this is because the first part of the info log says:
The genrule above corresponds to this bazel genrule, which is created by the tf_library build macro:
I'm suspecting that you saw this info log in #13482 because you were running That said, note that by default tfcompile doesn't assume any target-specific features (SIMD, etc). If there are specific features you'd like to enable, you need to set them via the following tfcompile flags:
These may be specified in your
|
@tatatodd Thanks for the information. I specified those flags in the The
Then instead of running
but the logging info still exists:
May I have your advice? |
@MartinZZZ I see. I had misinterpreted your previous message; I thought you meant that you no longer saw the log info when using As I mentioned, the log message is harmless, and is a side-effect of the way we perform the compilation. We're changing the implementation (for a different reason), at which point the info log will go away. In the meantime you should just ignore it. |
@tatatodd Thanks for the explanation. I seem to understand now, and may I check with you regarding the following two simple questions: (1) Does it mean that the created binary is actually able to use the SIMD instructions as specified by the (2) Whether the |
@MartinZZZ Answers to your questions:
|
@tatatodd Thanks for your answers:) And please correct me if I am mistaken. I wonder if (1) Will (2) If so, is the code generation done by |
so am i |
@lijiansong, could you clarify your question? |
@carlthome MartinZZZ and lijiansong are asking which one has better runtime performance: XLA-JIT or XLA-AOT? The documentation seems to imply that XLA-AOT is meant for space-constrained situations (e.g. mobile) but does not mention anything regarding runtime performance of XLA-AOT vs XLA-JIT. Any clarification on that point would be welcome. |
AOT and JIT provides the same performance benefits (e.g. op fusion, constant folding, common subexpression elimination and other HLOs). The downside to AOT is that you have to specify static tensor shapes and know what hardware you're targeting, while JIT would do that for you. The downside to JIT is that compilation happens in the runtime (which takes extra time) and that you have to bundle the compiler with your program. |
I found similar issues mentioned as #8037, #7778 etc, but the issue seems not solved: the warnings did disappear after building with the necessary optimization options, but they appeared again when I followed this tutorial (https://www.tensorflow.org/performance/xla/tfcompile) to the last step. So, is the tensorflow binary compiled to use the SIMD instructions or not?
System information
Issue reproducing:
Configure: only jemalloc and XLA JIT support are ticked. The default optimization flag is
-march=native
, therefore was not specified;Build pip package:
tfcompile
binary://tensorflow/compiler/aot/tests
:Step 1: The config file already exists as
test_graph_tfmatmul.config.pbtxt
;Step 2.1: Generate the graph file
test_graph_tfmatmul.pb
:tfcompile
:my_code.cc
:BUILD
file:Finally, it will print:
INFO: From Executing genrule //tensorflow/compiler/aot/tests:gen_test_graph_tfmatmul: 2017-10-05 15:15:29.233159: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
(An error will also occur, but that is another issue #13482).
So, is the tensorflow binary compiled to use the SIMD instructions (SSE4.1 SSE4.2 AVX AVX2 FMA) or not? May I have your advice?
The text was updated successfully, but these errors were encountered: