Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java interface #5

Closed
ravwojdyla opened this issue Nov 9, 2015 · 112 comments
Closed

Java interface #5

ravwojdyla opened this issue Nov 9, 2015 · 112 comments
Assignees

Comments

@ravwojdyla
Copy link

Issue to trace effort of swig interface for java. Started implementation - will update with progress. If anyone has any comments/tips - please feel welcome to join the discussion!

@girving
Copy link
Contributor

girving commented Nov 9, 2015

Nice!

@girving
Copy link
Contributor

girving commented Nov 9, 2015

Moving this comment here from #3:


There's a testsuite with fairly good converge, but currently it's mostly Python with a few C++ tests. There's also a lot of functionality for building graphs that's currently Python only, in particular the automatic differentiation functionality, though that doesn't matter for evaluation of graphs in Java. There are plans to move this functionality into the underlying C++ in future, at which point Java SWIG bindings would be more useful for creating graphs.

If someone takes up the Java SWIG challenge, we'd be happy to accept it upstream pending review, etc., at which point it would be part of our continuous testing. The details of accepting contributions is in flux at the moment, but that will stabilize.

@pslam
Copy link

pslam commented Nov 18, 2015

Hello guys
We are also interested in adapting TensorFlow for Java. @ravwojdyla Have you, by any chance, started working on the Swig Interface for Java? If you have, we could join our efforts and collaborate on that

@kylevedder
Copy link

Hello,
I am working on a SWIG wrap of the main C++ API. You can see my progress thus far on my fork, but what's up there is not finished; I'm currently experiencing a problem in which #include "tensorflow/core/lib/core/error_codes.pb.h" is unable to be resolved and I cannot find the intended file anywhere within the project files. Input of any kind would be greatly appreciated.

@bhack
Copy link
Contributor

bhack commented Nov 22, 2015

There are javacpp preset available for libiraries like Caffe and OpenCV. See also bytedeco/javacpp-presets#111. Java-cpp enable also IOS with RoboVM

@bhack
Copy link
Contributor

bhack commented Nov 22, 2015

@girving Initial commit at bytedeco/javacpp-presets@374e1d5

@bhack
Copy link
Contributor

bhack commented Nov 22, 2015

/cc @saudet

@ravwojdyla
Copy link
Author

@pslam - I was able to work just a little bit on this - could definitely use some help!

@saudet
Copy link

saudet commented Nov 23, 2015

Hi guys, I believe I have pretty functional bindings for JavaCPP: https://github.com/bytedeco/javacpp-presets/tree/master/tensorflow. Let me know if you see anything that could be done with SWIG, but not JavaCPP. I could definitely use the feedback. (Thanks for the cc @bhack!)

@kylevedder
Copy link

Very nicely done @saudet! I have almost finished a SWIG wrap, but it seems that your implementation works just as well. I do not see anything that my SWIG wrap can do that yours cannot do. JavaCPP seems very cool, I'll have to look into using it for future projects.

@tngan
Copy link

tngan commented Nov 23, 2015

Hi @kylevedder, have you resolved the issue related to error_codes.pb.h ?
[Edited]
All .pb.h files are compiled from .proto

@kylevedder
Copy link

@tngan Yes, that is what I discovered as well. Additionally, the .proto files in this project require ProtoBuff3 to be used. I'm using Ubuntu 14.04 and ProtoBuff3 was not available in my package manager, so I compiled it from source, which I got from the 3.0.0 beta release.

The current roadblock which I am trying to solve is how to get ProtoBuff to recurse over the entire file tree and compile the .proto files into .h and .cc files; doing each folder piecemeal results in failures due to unsatisfied dependencies upon other yet to be compiled .proto files.

@davidzchen
Copy link
Contributor

@kylevedder Are your SWIG wrappers in a separate repository or are you working in the tensorflow repository? protoc works similar to other compilers. If you are working in the tensorflow repository or are using Bazel, then you would need to set up the protobuf build targets and the dependencies among them.

If you are working in a separate repository and using a different build system, then you would need to use the protobuf plugin for that build system.

I'd be happy to help you set up the build if you would like.

@kylevedder
Copy link

@davidzchen Thank you for the offer, any and all help is greatly appreciated.

What I have thus far:

I have already setup Bazel and gotten it to compile into a .whl file, which I then handed over to pip and confirmed that I can run the First TensorFlow program.

I have generated SWIG wrapper files in my forked repository. They are in a folder under core/javaWrapper. [link]

What I am trying to do:

Ultimately, my goal is to generate a .so file which than can be called as a native library in Java. Currently, I'm attempting to use g++ to compile the entire system into a .so file; however, the .proto files need to first be expanded into .hs and .ccs prior to this compilation, and that is what I am trying to do with protoc.

You can see my attempt at a wrap script here to potentially get a better idea of what it is I am getting at, although thus far all of my attempts at using protoc has been directory by directory and, consequently, not in the script.

Finally, any feedback on areas of improvement would be greatly appreciated. Thanks!

@saudet
Copy link

saudet commented Nov 24, 2015

@kylevedder I already have an .so build as part of the JavaCPP Presets: https://github.com/bytedeco/javacpp-presets/tree/master/tensorflow. Thanks to Bazel, it's really simple. Just apply a patch like this:

diff -ruN tensorflow/tensorflow/cc/BUILD tensorflow-patch/tensorflow/cc/BUILD
--- tensorflow/tensorflow/cc/BUILD  2015-11-22 00:00:02.441829192 +0900
+++ tensorflow-patch/tensorflow/cc/BUILD    2015-11-14 11:15:12.689330351 +0900
@@ -75,6 +75,17 @@
     ],
 )

+cc_binary(
+    name = "libtensorflow.so",
+    copts = tf_copts(),
+    linkshared = 1,
+    deps = [
+        ":cc_ops",
+        "//tensorflow/core:kernels",
+        "//tensorflow/core:tensorflow",
+    ],
+)
+
 filegroup(
     name = "all_files",
     srcs = glob(

And run Bazel like this, for example:

bazel build -c opt //tensorflow/cc:libtensorflow.so

AFAIK, this should gobble up pretty much anything of interest for the C++ API.

@davidzchen
Copy link
Contributor

@saudet Is there a reason why you are using a cc_binary rule to build the shared library rather than cc_library? You can just have a cc_library rule with the name tensorflow and the build target will build a shared library called libtensorflow.so.

@kylevedder If your goal is to generate an .so file, then something similar to what @saudet suggested would work.

If you need to use the TensorFlow protos in Java code, then you would need to add dependencies from your java_* Bazel build targets to the proto_library targets that generate the Java classes from the .proto files.

We still have a bit of work to do before we open-source the native proto_library rules (see bazelbuild/bazel#52), but in the meantime, TensorFlow uses the cc_proto_library and py_proto_library rules provided by protobuf, and for Java, you should be able to use the Java genproto rule that is included with Bazel. I will check with the team to find out what the timeline for proto_library is and whether it would be worthwhile to unify the rules provided by Protobuf with genproto.

A few other bits of feedback:

  • I think it would be better to keep the directory names consistent and use java_wrapper rather than javaWrapper
  • Perhaps a better place for the Java wrapper would be //tensorflow/java/wrapper rather than //tensorflow/core/java_wrapper?
  • Internally, we have some build rules that take .swig files and generate the sources. This is more ideal because we would avoid checking in the generated files. I can take a look to see how difficult it would be for us to add some SWIG build rules for Bazel to make stuff like this easier.

@saudet
Copy link

saudet commented Nov 24, 2015

@davidzchen No reason in particular. I'm new to Bazel and just using linkshared=1 as I've seen mentioned on the mailing list worked. So thanks for the tip! I'll be updating that.

@davidzchen
Copy link
Contributor

@saudet Thanks! I was just checking to make sure that it wasn't an issue with Bazel. :) Feel free to let me know or open a bug if you run into any issues.

@kylevedder
Copy link

@saudet Thanks for the info on using Bazel. I too am new to it and did not realize it was capable of generating a .so in that manner.

@davidzchen Thanks for the addendum about using a cc_library, I modified the example from @saudet accordingly when I implemented my Bazil wrapper build. Also, thank you for the input regarding the directory structure; I have updated my folder structure to align with your suggestions.

Additionally, I was not very clear in my previous comment about generating .so files; while my objective is to generate a .so file from the original source, I also want to include the .cxx file that SWIG generates inside of the .so in order to facilitate the JNI calls. Currently, I'm running into an issue in which I cannot get the SWIG generated .cxx file to compile; it's trying to reference JNI.h, a header located in $JAVA_HOME/include/, but I cannot seem to get Bazel to understand the external include path.

@saudet
Copy link

saudet commented Nov 24, 2015

@davidzchen Hum, nope, cc_library doesn't work. I don't see any other way to make Bazel pass the -shared option to the compiler: http://bazel.io/docs/be/c-cpp.html.

@davidzchen
Copy link
Contributor

@saudet I don't think you need to pass -shared yourself. cc_library should be building a .so by default. Does that work for you?

@davidzchen
Copy link
Contributor

@kylevedder You won't be able to add the JNI headers that way since they're outside the workspace. However, Bazel includes the local JDK as a local repository and provides a number of built-in targets (see jdk.WORKSPACE and corresponding jdk.BUILD) you can use to depend on local JDK. These are included in each Bazel workspace by default.

Bazel itself uses JNI and interfaces with the local JDK this way (see src/main/native/BUILD). In this BUILD file, there are two genrules to copy the JNI headers and a cc_library target for the library it is building that uses JNI that depends on the headers, and a includes = ["."] so that the C++ code can include the JNI header with #include <jni.h>. This is currently not documented because we are working on a number of improvements to the external repository mechanism, and the @local-jdk name might change, but we can use it for TensorFlow and any other Bazel project that uses JNI in the meantime.

Here is a patch for your BUILD file that adds the genrule targets for copying the JNI headers you need and some changes to the cc_library target to set up the right dependencies, namely:

  1. Add jni.h and jni_md.h, which are copied to the current package by the genrules to srcs
  2. Add a dependency on //tensorflow/core so that you can include the headers under tensorflow/core/public. Note that, headers or any source file in a separate directory are in a separate package from Bazel's point of view, and you will need to add a dependency on the build target that contains those files.
diff --git a/tensorflow/core/java/wrapper/BUILD b/tensorflow/core/java/wrapper/BUILD
index 72b4076..04a3394 100644
--- a/tensorflow/core/java/wrapper/BUILD
+++ b/tensorflow/core/java/wrapper/BUILD
@@ -7,10 +7,30 @@ exports_files(["LICENSE"])
 load("/tensorflow/tensorflow", "tf_copts")
 load("/tensorflow/tensorflow", "tf_gen_op_wrappers_cc")

+genrule(
+    name = "copy_link_jni_md_header",
+    srcs = ["//external:jni_md_header-linux"],
+    outs = ["jni_md.h"],
+    cmd = "cp -f $< $@",
+)
+
+genrule(
+    name = "copy_link_jni_header",
+    srcs = ["//external:jni_header"],
+    outs = ["jni.h"],
+    cmd = "cp -f $< $@",
+)
+
 cc_library(
     name = "java_wrapper",
-    srcs = glob(["*.cc","*.cxx","*.h"]),
-    copts = ["-I$$JAVA_HOME/include/", "-I$$JAVA_HOME/include/linux/"],
+    srcs = glob(["*.cc", "*.cxx", "*.h"]) + [
+        ":jni.h",
+        ":jni_md.h",
+    ],
+    includes = ["."],
+    deps = [
+        "//tensorflow/core",
+    ],
     visibility = ["//visibility:public"],
 )

Note that in general, compile actions in Bazel are run from the root of the source tree, and you would need to change the includes in your SWIG file as follows and then re-generate the C++ files so that they will have the correct includes as well:

diff --git a/tensorflow/core/java/wrapper/tensor_c_api.i b/tensorflow/core/java/wrapper/tensor_c_api.i
index d08b571..9ab1fa1 100644
--- a/tensorflow/core/java/wrapper/tensor_c_api.i
+++ b/tensorflow/core/java/wrapper/tensor_c_api.i
@@ -1,8 +1,8 @@
 %module tensor_c_api_module
 %{
-#include "../../public/tensor_c_api.h"
+#include "tensorflow/core/public/tensor_c_api.h"
 %}
-%include "../../public/tensor_c_api.h"
+%include "tensorflow/core/public/tensor_c_api.h"
 %include "stddef.h"

Once this works, you would have the JNI build set up for Linux since the copy_link_jni_md_header genrule only copies the Linux-specific header. To have it copy the correct platform-specific JNI header, we would need to do the following:

  1. Set up cpu config_settings for other platforms. Currently, tensorflow has a config_setting for --cpu=darwin in tensorflow/python/BUILD. We should probably move that a more appropriate package such as //tensorflow/core. Basically, we would want the same set of config_settings as Bazel (see src/BUILD).
  2. Have copy_link_jni_md_header copy the right JNI header based on which config setting is set using select(), similar to the one in Bazel. Our genrule would look something like the following:
genrule(
    name = "copy_link_jni_md_header",
    srcs = select({
        "//tensorflow/core:darwin": ["//external:jni_md_header-darwin"],
        "//tensorflow/core:darwin_x86_64": ["//external:jni_md_header-darwin"],
        "//tensorflow/core:freebsd": ["//external:jni_md_header-freebsd"],
        "//conditions:default": ["//external:jni_md_header-linux"],
    }),
    outs = ["jni_md.h"],
    cmd = "cp -f $< $@",
)

I'd be happy to help you with this if you run into any issues. Let me know if this works for you.

@saudet
Copy link

saudet commented Nov 25, 2015

@davidzchen cc_library generates a bunch of .a files, but no .so file. I'm using 0.1.0 as was previously recommended for TensorFlow... Maybe it's fixed in 0.1.1? I'll have to try again.

@kylevedder
Copy link

@davidzchen Thank you very much for your help. I have followed your instructions and updated both the Java wrapper BUILD file as well as the SWIG .i file as you suggested. Additionally, I moved the wrap script from core/java/wrapper to the root directory and updated the links accordingly.

For now, I have skipped the generalization the genrule for the jni_md.h file, instead focusing on trying to get libtensorflow.so built. Unfortunately, it appears to me as though libtensorflow.so is not being generated; I ended up searching my entire file system for anything named some variant of "libtensorflow" and nothing relevant appeared. It may be named differently or this may be a simple case of user error. Additionally, there is a possibility that it may be related to the issue that @saudet is experiencing with the cc_library rule for .so generation.

Once again, thank you for all of your help, I really appreciate it.

@davidzchen
Copy link
Contributor

Sorry, it turns out I was wrong. In order to build a .so that includes the transitive dependencies, what @saudet did using cc_binary with linkshared = 1 and name = "libtensorflow.so" was correct. From the cc_binary.linkshared documentation:

Create a shared library. To enable this attribute, include linkshared=1 in your rule. By default this option is off. If you enable it, you must name your binary libfoo.so (or whatever is the naming convention of libraries on the target platform) for some sensible value of foo.

The main difference between the .so's built by cc_library targets and the .so built with cc_binary using the method described above is that the cc_library artifacts only contain the code in srcs. This is why building cc_library targets with no srcs and only deps, such as //tensorflow/core, do not produce any artifacts. On the other hand, cc_binary targets will link in all the transitive dependencies.

I apologize for the confusion. Perhaps we should improve our documentation and add an example on building .sos.

@ivanseidel
Copy link

I guess you should follow those steps to build Tensorflow and all it's dependencies. We are working on porting TensorFlow to node.js, and I've implemented a shell script to compile and getter only essential sources from the whole repo:
https://github.com/node-tensorflow/node-tensorflow/blob/1.0.0/tools/install.sh#L233-L282

copybara-service bot pushed a commit that referenced this issue May 13, 2021
copybara-service bot pushed a commit that referenced this issue Aug 10, 2021
It makes analyzer output more useful.

example)

Your TFLite model has ‘1’ subgraph(s). In the subgraph description below,
T# represents the Tensor numbers. For example, in Subgraph#0, the RESHAPE op takes
tensor #0 and tensor #1 as input and produces tensor #5 as output.

Subgraph#0 main(T#0) -> [T#9]
  Op#0 RESHAPE(T#0, T#1) -> [T#5]
  Op#1 STRIDED_SLICE(T#5, T#2, T#2, T#3) -> [T#6]
  Op#2 RESIZE_BILINEAR(T#6, T#4) -> [T#7]
  Op#3 RESIZE_BILINEAR(T#6, T#4) -> [T#8]
  Op#4 ADD(T#7, T#8) -> [T#9]

Tensors of Subgraph#0
  T#0(image) shape:[5, 5], type:FLOAT32
  T#1(strided_slice) shape:[4], type:INT32
  T#2(strided_slice1) shape:[4], type:INT32
  T#3(strided_slice2) shape:[4], type:INT32
  T#4(ResizeBilinear/size) shape:[2], type:INT32
  T#5(strided_slice3) shape:[1, 5, 1, 5], type:FLOAT32
  T#6(strided_slice4) shape:[1, 5, 5, 1], type:FLOAT32
  T#7(ResizeBilinear) shape:[1, 2, 2, 1], type:FLOAT32
  T#8(ResizeBilinear_1) shape:[1, 2, 2, 1], type:FLOAT32
  T#9(Identity) shape:[1, 2, 2, 1], type:FLOAT32

PiperOrigin-RevId: 389795468
Change-Id: I0fda5bb74568c68459359a8a39f1627b459b7a4b
copybara-service bot pushed a commit that referenced this issue Dec 23, 2021
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis.

When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away.

related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684

A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build.

```
Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault.
[Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))]
0x00007fff54530000 in ?? ()
(gdb) where
#0  0x00007fff54530000 in ?? ()
#1  0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#2  0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#3  0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#4  0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#5  0x00007fffd52fe983 in dnnl_sgemm ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#6  0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) ()
#7  0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) ()
#8  0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) ()
#9  0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) ()
#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6
```
anandj91 pushed a commit to anandj91/tensorflow that referenced this issue Jul 27, 2022
* avoid autotune for cus type

* support cus matmul with cutlass using float32 as computation type

* add cus support for more ops

* add stream and scratchmemory to cutlass conv function interface

* avoid recompilation when changing env vars

* temporarily disabled GpuConvAlgorithmPicker

* add hlo compare for cus

* emulating fp16, add forceinline
mihaimaruseac pushed a commit that referenced this issue Aug 13, 2022
467408627  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update sqlite version in TF

--
467380418  by A. Unique TensorFlower<gardener@tensorflow.org>:

    compat: Update forward compatibility horizon to 2022-08-13

--
467378663  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update GraphDef version to 1222.

--
467363891  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update TFRT dependency to use revision
    http://github.com/tensorflow/runtime/commit/b750bc2999cf02abca6ad9eeff0a04ec7bf3b683.

--
467363622  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [xla:runtime] NFC: Move constraints documentation from jitrt to xla/runtime/constraints

--
467362586  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [xla:runtime] NFC: Extract JitCompilationContext library from jitrt and move it to xla/runtime

--
467361314  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update TFRT dependency to use revision
    http://github.com/tensorflow/runtime/commit/0a042cbb5275e6ff9a3a7c2748c74df6dcede09e.

--
467360160  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [xla:runtime] NFC: Extract calling_convention library from jitrt and move it to xla/runtime

--
467341954  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Op documentation update.
    	update of g3doc/_includes/tf_passes.md

--
467341426  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Refactor SELECT_V2 in preparation for porting to TFLM.

--
467340678  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Create some global stat tracking for CompilationEnvironments. This tracking can be used to help debug cases in which multiple CompilationEnvironments are used to compile a single HloModule (which should not happen).

--
467339870  by A. Unique TensorFlower<gardener@tensorflow.org>:
    Automated rollback of changelist 467224197.

467339756  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update TFRT dependency to use revision
    http://github.com/tensorflow/runtime/commit/b20ec05d272477fa6223213687bb22145df92674.

--
467339529  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [XLA] Bugfix for gather index parallel partitioning where the sharded non-parallel dims in indices are not handled.

--
467337900  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [XLA] Minor renamings, refactorings, checks.

--
467337622  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Remove unneeded dependency.

--
467337170  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Integrate LLVM at llvm/llvm-project@2c3ca3b684bb

    Updates LLVM usage to match
    [2c3ca3b684bb](llvm/llvm-project@2c3ca3b684bb)

--
467335264  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [SavedModel Fingerprinting] Add hash #5, which represents the checkpoint.

    The `checkpoint_hash` is a hash of the serialized .index file, which is the metadata file of the TensorBundle containing a string-string table
    of the name of a tensor to its serialized BundleEntryProto. The BundleEntryProto contains a crc32 hash of the tensor contents, but not the contents of the tensor itself.

    RFC: tensorflow/community#415

--
467334010  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update TFRT dependency to use revision
    http://github.com/tensorflow/runtime/commit/76b3fea4cc9d5e7cb8a85798e41a61a55c301578.

--
467332094  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [xla:runtime] NFC: Extract executable library from jitrt and move it to xla/runtime

--
467324078  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <vector> for 'std::vector'

--
467322782  by A. Unique TensorFlower<gardener@tensorflow.org>:

    PR #57137: [oneDNN] Skip appending kernel registration to log message for MKL ops

    Imported from GitHub PR #57137

    This PR skips printing kernel registrations for MKL ops since it leads to performance drop for some eager models caused by this commit c04f65d This is a temporary fix and the condition will be removed when support for block format is removed as a more permanent fix.
    Copybara import of the project:

    --
    89c4c20 by Kanvi Khanna <kanvi.khanna@intel.com>:

    [oneDNN] Skip appending kernel registration to log message for MKL ops

    Merging this change closes #57137

--
467322425  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <memory> for 'std::unique_ptr'

--
467321561  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    'int64' is deprecated: Use int64_t instead.

--
467321058  by A. Unique TensorFlower<gardener@tensorflow.org>:

    PR #57089: [TF-TRT] Adjusting Conv2D Test Tolerance

    Imported from GitHub PR #57089

    This PR adjusts & fixes the unittest tolerance for the test `Conv2DStridedNCHWTest` in INT8 mode.

    Copybara import of the project:

    --
    13e4bff by DEKHTIARJonathan <contact@jonathandekhtiar.eu>:

    [TF-TRT] Adjusting Conv2D Test Tolerance

    Merging this change closes #57089

--
467320826  by A. Unique TensorFlower<gardener@tensorflow.org>:

    PR #55804: [TF-TRT] Various Cleanups & Python Debugging Assertion Improvements

    Imported from GitHub PR #55804

    This PR cleans a few spots in the code base, improves the debuggability of assertion messages in unittests. And replace `distutils.version.LooseVersion` (deprecated) with `packaging.version.Version` (new recommended API).
    Copybara import of the project:

    --
    a4d15ef by DEKHTIARJonathan <contact@jonathandekhtiar.eu>:

    [TF-TRT] Various Cleanups & Python Debugging Assertion Improvements

    Merging this change closes #55804

--
467320083  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <ostream> for 'std::ostream'

--
467319094  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    'int64' is deprecated: Use int64_t instead.

--
467318151  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <iterator> for 'std::back_inserter'

--
467316931  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    using decl 'IsSubsetOf' is unused

--
467316097  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Move passes under tensorflow/compiler/mlir/tensorflow/.

--
467315812  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <memory> for 'std::unique_ptr'

--
467314236  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <memory> for 'std::unique_ptr'

--
467313254  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <vector> for 'std::vector'
    missing #include <memory> for 'std::make_unique'

--
467312293  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    using decl 'RangeSquareDataset' is unused

--
467311309  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <vector> for 'std::vector'

--
467310637  by A. Unique TensorFlower<gardener@tensorflow.org>:

    PR #57013: [TF-TRT] Add LogSoftmax Support for TF-TRT

    Imported from GitHub PR #57013

    This PR adds TF-TRT support to `tf.nn.log_softmax` operation.  This is performed using the formula `logsoftmax = logits - log(reduce_sum(exp(logits), axis=-1))` .  The implemented TRT layers are fused into a single op.

    @DEKHTIARJonathan @tfeher : Please review the changes.
    Copybara import of the project:

    --
    1a8eb9a by Pavani Majety <pmajety@nvidia.com>:

    Add LogSoftmax conversion

    Fix Softmax comments

    [TF-TRT] Move LogSoftmax to use OpConverterBase

    Fix compiler errors

    clang-format

    Undo changes to convert_nodes.cc

    Fix comments

    Merging this change closes #57013

--
467310335  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <array> for 'std::array'

--
467310313  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update test config in cross device ops

--
467309032  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update TFRT dependency to use revision
    http://github.com/tensorflow/runtime/commit/eba528ef667653c3554984e5c05573b152c9893b.

--
467308765  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <vector> for 'std::vector'

--
467307702  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <vector> for 'std::vector'

--
467306473  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    'int64' is deprecated: Use int64_t instead.

--
467306092  by A. Unique TensorFlower<gardener@tensorflow.org>:

    PR #56771: Add return_index_map argument in ssim()

    Imported from GitHub PR #56771

    Closes #53115
    Copybara import of the project:

    --
    8f5a1b1 by CohenAriel <ariel17112005@gmail.com>:

    Add return_index_map argument in ssim()

    Merging this change closes #56771

--
467305190  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    missing #include <unordered_map> for 'std::unordered_map'
    missing #include <vector> for 'std::vector'
    missing #include <memory> for 'std::shared_ptr'

--
467304747  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [tfrt:jitrt] NFC: Remove Executable::KernelContext

    It was added before runtime::KernelContext and is not used anywhere. Remove it to avoid confusion. In the future we should reuse runtime::KernelContext as an extension point for user-defined memory allocation etc.

--
467303335  by A. Unique TensorFlower<gardener@tensorflow.org>:

    #tf-data-service #codehealth Clean up clang-tidy report.

    'int64' is deprecated: Use int64_t instead.

--
467301808  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Changes all local `State` or `TaskState` enum in coordination service into `CoordinatedTaskState` enum in proto.

--
467300580  by A. Unique TensorFlower<gardener@tensorflow.org>:

    lite: enable variable freezing in tf_tfl_translate tester

--
467298890  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Update TFRT dependency to use revision
    http://github.com/tensorflow/runtime/commit/9bb23f7d1ee0e9a55d26c7168790667e5266a74c.

--
467292686  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [xla:runtime] NFC: Move execution_engine library from jitrt to xla/runtime

--
467280901  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [GML] Add tests for concat in the GML tiling and fusion pipeline

--
467276349  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [GML] Implement dim-based shape reification for concat

--
467273958  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Change mutexes under stream_executor/gpu to use absl::Mutex and absl::MutexLock instead of tensorflow::mutex and tensorflow::mutex_lock. Change instance of absl::make_unique to std::make_unique

--
467272897  by A. Unique TensorFlower<gardener@tensorflow.org>:

    [tf.data] Prepend `/bufferedio/` for all paths passed to LoadDataset op.

--

PiperOrigin-RevId: 467408627
Rjasuja added a commit to anikulk/tensorflow that referenced this issue Sep 22, 2023
Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>
Rjasuja added a commit to anikulk/tensorflow that referenced this issue Sep 25, 2023
Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>
fsx950223 pushed a commit to fsx950223/tensorflow that referenced this issue Nov 28, 2023
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis.

When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away.

related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684

A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build.

```
Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault.
[Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))]
0x00007fff54530000 in ?? ()
(gdb) where
#0  0x00007fff54530000 in ?? ()
#1  0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#2  0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#3  0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#4  0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#5  0x00007fffd52fe983 in dnnl_sgemm ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#6  0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) ()
tensorflow#7  0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) ()
tensorflow#8  0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) ()
tensorflow#9  0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) ()
tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6
```
ayounes-synaptics pushed a commit to syna-synap/tensorflow that referenced this issue Feb 29, 2024
- add --allow_fp16 option for xnnpack to enable fp32 relax to fp16 vs640
cpu armv8.2 has fp extension, has big improvement when inference in fp16
- enable xnnpack quant optimization by default, but xnnpack is mainly
used for fp model, sometimes inference performance is not good, just
disable it by benchmark option: --use_xnnpack=false
- enable xnnpack buffer transient indirection by default to reduce
memory footprint of indirection buffers
- enable xnnpack dynamic fully connected
copybara-service bot pushed a commit that referenced this issue May 2, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=#62750 from mattbahr:implement-sampled-addmm-v2 c295a0e
PiperOrigin-RevId: 630081768
copybara-service bot pushed a commit that referenced this issue May 2, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=#62750 from mattbahr:implement-sampled-addmm-v2 c295a0e
PiperOrigin-RevId: 630081768
copybara-service bot pushed a commit that referenced this issue May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests