Java interface #5

ravwojdyla · 2015-11-09T16:18:36Z

Issue to trace effort of swig interface for java. Started implementation - will update with progress. If anyone has any comments/tips - please feel welcome to join the discussion!

girving · 2015-11-09T16:36:48Z

Nice!

girving · 2015-11-09T16:47:06Z

Moving this comment here from #3:

There's a testsuite with fairly good converge, but currently it's mostly Python with a few C++ tests. There's also a lot of functionality for building graphs that's currently Python only, in particular the automatic differentiation functionality, though that doesn't matter for evaluation of graphs in Java. There are plans to move this functionality into the underlying C++ in future, at which point Java SWIG bindings would be more useful for creating graphs.

If someone takes up the Java SWIG challenge, we'd be happy to accept it upstream pending review, etc., at which point it would be part of our continuous testing. The details of accepting contributions is in flux at the moment, but that will stabilize.

pslam · 2015-11-18T13:00:43Z

Hello guys
We are also interested in adapting TensorFlow for Java. @ravwojdyla Have you, by any chance, started working on the Swig Interface for Java? If you have, we could join our efforts and collaborate on that

kylevedder · 2015-11-22T08:06:52Z

Hello,
I am working on a SWIG wrap of the main C++ API. You can see my progress thus far on my fork, but what's up there is not finished; I'm currently experiencing a problem in which #include "tensorflow/core/lib/core/error_codes.pb.h" is unable to be resolved and I cannot find the intended file anywhere within the project files. Input of any kind would be greatly appreciated.

bhack · 2015-11-22T08:17:58Z

There are javacpp preset available for libiraries like Caffe and OpenCV. See also bytedeco/javacpp-presets#111. Java-cpp enable also IOS with RoboVM

bhack · 2015-11-22T08:20:18Z

@girving Initial commit at bytedeco/javacpp-presets@374e1d5

bhack · 2015-11-22T08:22:32Z

/cc @saudet

ravwojdyla · 2015-11-22T15:03:30Z

@pslam - I was able to work just a little bit on this - could definitely use some help!

saudet · 2015-11-23T01:47:54Z

Hi guys, I believe I have pretty functional bindings for JavaCPP: https://github.com/bytedeco/javacpp-presets/tree/master/tensorflow. Let me know if you see anything that could be done with SWIG, but not JavaCPP. I could definitely use the feedback. (Thanks for the cc @bhack!)

kylevedder · 2015-11-23T02:45:49Z

Very nicely done @saudet! I have almost finished a SWIG wrap, but it seems that your implementation works just as well. I do not see anything that my SWIG wrap can do that yours cannot do. JavaCPP seems very cool, I'll have to look into using it for future projects.

tngan · 2015-11-23T10:11:43Z

Hi @kylevedder, have you resolved the issue related to error_codes.pb.h ?
[Edited]
All .pb.h files are compiled from .proto

kylevedder · 2015-11-23T15:11:42Z

@tngan Yes, that is what I discovered as well. Additionally, the .proto files in this project require ProtoBuff3 to be used. I'm using Ubuntu 14.04 and ProtoBuff3 was not available in my package manager, so I compiled it from source, which I got from the 3.0.0 beta release.

The current roadblock which I am trying to solve is how to get ProtoBuff to recurse over the entire file tree and compile the .proto files into .h and .cc files; doing each folder piecemeal results in failures due to unsatisfied dependencies upon other yet to be compiled .proto files.

davidzchen · 2015-11-24T00:17:24Z

@kylevedder Are your SWIG wrappers in a separate repository or are you working in the tensorflow repository? protoc works similar to other compilers. If you are working in the tensorflow repository or are using Bazel, then you would need to set up the protobuf build targets and the dependencies among them.

If you are working in a separate repository and using a different build system, then you would need to use the protobuf plugin for that build system.

I'd be happy to help you set up the build if you would like.

kylevedder · 2015-11-24T00:52:36Z

@davidzchen Thank you for the offer, any and all help is greatly appreciated.

What I have thus far:

I have already setup Bazel and gotten it to compile into a .whl file, which I then handed over to pip and confirmed that I can run the First TensorFlow program.

I have generated SWIG wrapper files in my forked repository. They are in a folder under core/javaWrapper. [link]

What I am trying to do:

Ultimately, my goal is to generate a .so file which than can be called as a native library in Java. Currently, I'm attempting to use g++ to compile the entire system into a .so file; however, the .proto files need to first be expanded into .hs and .ccs prior to this compilation, and that is what I am trying to do with protoc.

You can see my attempt at a wrap script here to potentially get a better idea of what it is I am getting at, although thus far all of my attempts at using protoc has been directory by directory and, consequently, not in the script.

Finally, any feedback on areas of improvement would be greatly appreciated. Thanks!

saudet · 2015-11-24T01:08:40Z

@kylevedder I already have an .so build as part of the JavaCPP Presets: https://github.com/bytedeco/javacpp-presets/tree/master/tensorflow. Thanks to Bazel, it's really simple. Just apply a patch like this:

diff -ruN tensorflow/tensorflow/cc/BUILD tensorflow-patch/tensorflow/cc/BUILD
--- tensorflow/tensorflow/cc/BUILD  2015-11-22 00:00:02.441829192 +0900
+++ tensorflow-patch/tensorflow/cc/BUILD    2015-11-14 11:15:12.689330351 +0900
@@ -75,6 +75,17 @@
     ],
 )

+cc_binary(
+    name = "libtensorflow.so",
+    copts = tf_copts(),
+    linkshared = 1,
+    deps = [
+        ":cc_ops",
+        "//tensorflow/core:kernels",
+        "//tensorflow/core:tensorflow",
+    ],
+)
+
 filegroup(
     name = "all_files",
     srcs = glob(

And run Bazel like this, for example:

bazel build -c opt //tensorflow/cc:libtensorflow.so

AFAIK, this should gobble up pretty much anything of interest for the C++ API.

davidzchen · 2015-11-24T02:22:38Z

@saudet Is there a reason why you are using a cc_binary rule to build the shared library rather than cc_library? You can just have a cc_library rule with the name tensorflow and the build target will build a shared library called libtensorflow.so.

@kylevedder If your goal is to generate an .so file, then something similar to what @saudet suggested would work.

If you need to use the TensorFlow protos in Java code, then you would need to add dependencies from your java_* Bazel build targets to the proto_library targets that generate the Java classes from the .proto files.

We still have a bit of work to do before we open-source the native proto_library rules (see bazelbuild/bazel#52), but in the meantime, TensorFlow uses the cc_proto_library and py_proto_library rules provided by protobuf, and for Java, you should be able to use the Java genproto rule that is included with Bazel. I will check with the team to find out what the timeline for proto_library is and whether it would be worthwhile to unify the rules provided by Protobuf with genproto.

A few other bits of feedback:

I think it would be better to keep the directory names consistent and use java_wrapper rather than javaWrapper
Perhaps a better place for the Java wrapper would be //tensorflow/java/wrapper rather than //tensorflow/core/java_wrapper?
Internally, we have some build rules that take .swig files and generate the sources. This is more ideal because we would avoid checking in the generated files. I can take a look to see how difficult it would be for us to add some SWIG build rules for Bazel to make stuff like this easier.

saudet · 2015-11-24T02:42:59Z

@davidzchen No reason in particular. I'm new to Bazel and just using linkshared=1 as I've seen mentioned on the mailing list worked. So thanks for the tip! I'll be updating that.

davidzchen · 2015-11-24T03:25:39Z

@saudet Thanks! I was just checking to make sure that it wasn't an issue with Bazel. :) Feel free to let me know or open a bug if you run into any issues.

kylevedder · 2015-11-24T03:31:37Z

@saudet Thanks for the info on using Bazel. I too am new to it and did not realize it was capable of generating a .so in that manner.

@davidzchen Thanks for the addendum about using a cc_library, I modified the example from @saudet accordingly when I implemented my Bazil wrapper build. Also, thank you for the input regarding the directory structure; I have updated my folder structure to align with your suggestions.

Additionally, I was not very clear in my previous comment about generating .so files; while my objective is to generate a .so file from the original source, I also want to include the .cxx file that SWIG generates inside of the .so in order to facilitate the JNI calls. Currently, I'm running into an issue in which I cannot get the SWIG generated .cxx file to compile; it's trying to reference JNI.h, a header located in $JAVA_HOME/include/, but I cannot seem to get Bazel to understand the external include path.

saudet · 2015-11-24T11:41:19Z

@davidzchen Hum, nope, cc_library doesn't work. I don't see any other way to make Bazel pass the -shared option to the compiler: http://bazel.io/docs/be/c-cpp.html.

davidzchen · 2015-11-24T17:49:54Z

@saudet I don't think you need to pass -shared yourself. cc_library should be building a .so by default. Does that work for you?

davidzchen · 2015-11-24T18:09:44Z

@kylevedder You won't be able to add the JNI headers that way since they're outside the workspace. However, Bazel includes the local JDK as a local repository and provides a number of built-in targets (see jdk.WORKSPACE and corresponding jdk.BUILD) you can use to depend on local JDK. These are included in each Bazel workspace by default.

Bazel itself uses JNI and interfaces with the local JDK this way (see src/main/native/BUILD). In this BUILD file, there are two genrules to copy the JNI headers and a cc_library target for the library it is building that uses JNI that depends on the headers, and a includes = ["."] so that the C++ code can include the JNI header with #include <jni.h>. This is currently not documented because we are working on a number of improvements to the external repository mechanism, and the @local-jdk name might change, but we can use it for TensorFlow and any other Bazel project that uses JNI in the meantime.

Here is a patch for your BUILD file that adds the genrule targets for copying the JNI headers you need and some changes to the cc_library target to set up the right dependencies, namely:

Add jni.h and jni_md.h, which are copied to the current package by the genrules to srcs
Add a dependency on //tensorflow/core so that you can include the headers under tensorflow/core/public. Note that, headers or any source file in a separate directory are in a separate package from Bazel's point of view, and you will need to add a dependency on the build target that contains those files.

diff --git a/tensorflow/core/java/wrapper/BUILD b/tensorflow/core/java/wrapper/BUILD
index 72b4076..04a3394 100644
--- a/tensorflow/core/java/wrapper/BUILD
+++ b/tensorflow/core/java/wrapper/BUILD
@@ -7,10 +7,30 @@ exports_files(["LICENSE"])
 load("/tensorflow/tensorflow", "tf_copts")
 load("/tensorflow/tensorflow", "tf_gen_op_wrappers_cc")

+genrule(
+    name = "copy_link_jni_md_header",
+    srcs = ["//external:jni_md_header-linux"],
+    outs = ["jni_md.h"],
+    cmd = "cp -f $< $@",
+)
+
+genrule(
+    name = "copy_link_jni_header",
+    srcs = ["//external:jni_header"],
+    outs = ["jni.h"],
+    cmd = "cp -f $< $@",
+)
+
 cc_library(
     name = "java_wrapper",
-    srcs = glob(["*.cc","*.cxx","*.h"]),
-    copts = ["-I$$JAVA_HOME/include/", "-I$$JAVA_HOME/include/linux/"],
+    srcs = glob(["*.cc", "*.cxx", "*.h"]) + [
+        ":jni.h",
+        ":jni_md.h",
+    ],
+    includes = ["."],
+    deps = [
+        "//tensorflow/core",
+    ],
     visibility = ["//visibility:public"],
 )

Note that in general, compile actions in Bazel are run from the root of the source tree, and you would need to change the includes in your SWIG file as follows and then re-generate the C++ files so that they will have the correct includes as well:

diff --git a/tensorflow/core/java/wrapper/tensor_c_api.i b/tensorflow/core/java/wrapper/tensor_c_api.i
index d08b571..9ab1fa1 100644
--- a/tensorflow/core/java/wrapper/tensor_c_api.i
+++ b/tensorflow/core/java/wrapper/tensor_c_api.i
@@ -1,8 +1,8 @@
 %module tensor_c_api_module
 %{
-#include "../../public/tensor_c_api.h"
+#include "tensorflow/core/public/tensor_c_api.h"
 %}
-%include "../../public/tensor_c_api.h"
+%include "tensorflow/core/public/tensor_c_api.h"
 %include "stddef.h"

Once this works, you would have the JNI build set up for Linux since the copy_link_jni_md_header genrule only copies the Linux-specific header. To have it copy the correct platform-specific JNI header, we would need to do the following:

Set up cpu config_settings for other platforms. Currently, tensorflow has a config_setting for --cpu=darwin in tensorflow/python/BUILD. We should probably move that a more appropriate package such as //tensorflow/core. Basically, we would want the same set of config_settings as Bazel (see src/BUILD).
Have copy_link_jni_md_header copy the right JNI header based on which config setting is set using select(), similar to the one in Bazel. Our genrule would look something like the following:

genrule(
    name = "copy_link_jni_md_header",
    srcs = select({
        "//tensorflow/core:darwin": ["//external:jni_md_header-darwin"],
        "//tensorflow/core:darwin_x86_64": ["//external:jni_md_header-darwin"],
        "//tensorflow/core:freebsd": ["//external:jni_md_header-freebsd"],
        "//conditions:default": ["//external:jni_md_header-linux"],
    }),
    outs = ["jni_md.h"],
    cmd = "cp -f $< $@",
)

I'd be happy to help you with this if you run into any issues. Let me know if this works for you.

saudet · 2015-11-25T00:22:02Z

@davidzchen cc_library generates a bunch of .a files, but no .so file. I'm using 0.1.0 as was previously recommended for TensorFlow... Maybe it's fixed in 0.1.1? I'll have to try again.

kylevedder · 2015-11-25T06:19:48Z

@davidzchen Thank you very much for your help. I have followed your instructions and updated both the Java wrapper BUILD file as well as the SWIG .i file as you suggested. Additionally, I moved the wrap script from core/java/wrapper to the root directory and updated the links accordingly.

For now, I have skipped the generalization the genrule for the jni_md.h file, instead focusing on trying to get libtensorflow.so built. Unfortunately, it appears to me as though libtensorflow.so is not being generated; I ended up searching my entire file system for anything named some variant of "libtensorflow" and nothing relevant appeared. It may be named differently or this may be a simple case of user error. Additionally, there is a possibility that it may be related to the issue that @saudet is experiencing with the cc_library rule for .so generation.

Once again, thank you for all of your help, I really appreciate it.

davidzchen · 2015-11-25T12:00:40Z

Sorry, it turns out I was wrong. In order to build a .so that includes the transitive dependencies, what @saudet did using cc_binary with linkshared = 1 and name = "libtensorflow.so" was correct. From the cc_binary.linkshared documentation:

Create a shared library. To enable this attribute, include linkshared=1 in your rule. By default this option is off. If you enable it, you must name your binary libfoo.so (or whatever is the naming convention of libraries on the target platform) for some sensible value of foo.

The main difference between the .so's built by cc_library targets and the .so built with cc_binary using the method described above is that the cc_library artifacts only contain the code in srcs. This is why building cc_library targets with no srcs and only deps, such as //tensorflow/core, do not produce any artifacts. On the other hand, cc_binary targets will link in all the transitive dependencies.

I apologize for the confusion. Perhaps we should improve our documentation and add an example on building .sos.

ivanseidel · 2015-11-26T05:26:50Z

I guess you should follow those steps to build Tensorflow and all it's dependencies. We are working on porting TensorFlow to node.js, and I've implemented a shell script to compile and getter only essential sources from the whole repo:
https://github.com/node-tensorflow/node-tensorflow/blob/1.0.0/tools/install.sh#L233-L282

Stay up to date

It makes analyzer output more useful. example) Your TFLite model has ‘1’ subgraph(s). In the subgraph description below, T# represents the Tensor numbers. For example, in Subgraph#0, the RESHAPE op takes tensor #0 and tensor #1 as input and produces tensor #5 as output. Subgraph#0 main(T#0) -> [T#9] Op#0 RESHAPE(T#0, T#1) -> [T#5] Op#1 STRIDED_SLICE(T#5, T#2, T#2, T#3) -> [T#6] Op#2 RESIZE_BILINEAR(T#6, T#4) -> [T#7] Op#3 RESIZE_BILINEAR(T#6, T#4) -> [T#8] Op#4 ADD(T#7, T#8) -> [T#9] Tensors of Subgraph#0 T#0(image) shape:[5, 5], type:FLOAT32 T#1(strided_slice) shape:[4], type:INT32 T#2(strided_slice1) shape:[4], type:INT32 T#3(strided_slice2) shape:[4], type:INT32 T#4(ResizeBilinear/size) shape:[2], type:INT32 T#5(strided_slice3) shape:[1, 5, 1, 5], type:FLOAT32 T#6(strided_slice4) shape:[1, 5, 5, 1], type:FLOAT32 T#7(ResizeBilinear) shape:[1, 2, 2, 1], type:FLOAT32 T#8(ResizeBilinear_1) shape:[1, 2, 2, 1], type:FLOAT32 T#9(Identity) shape:[1, 2, 2, 1], type:FLOAT32 PiperOrigin-RevId: 389795468 Change-Id: I0fda5bb74568c68459359a8a39f1627b459b7a4b

On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () #7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () #8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () #9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () #10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () #11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```

* avoid autotune for cus type * support cus matmul with cutlass using float32 as computation type * add cus support for more ops * add stream and scratchmemory to cutlass conv function interface * avoid recompilation when changing env vars * temporarily disabled GpuConvAlgorithmPicker * add hlo compare for cus * emulating fp16, add forceinline

@DEKHTIARJonathan

467408627 by A. Unique TensorFlower<gardener@tensorflow.org>: Update sqlite version in TF -- 467380418 by A. Unique TensorFlower<gardener@tensorflow.org>: compat: Update forward compatibility horizon to 2022-08-13 -- 467378663 by A. Unique TensorFlower<gardener@tensorflow.org>: Update GraphDef version to 1222. -- 467363891 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/b750bc2999cf02abca6ad9eeff0a04ec7bf3b683. -- 467363622 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Move constraints documentation from jitrt to xla/runtime/constraints -- 467362586 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Extract JitCompilationContext library from jitrt and move it to xla/runtime -- 467361314 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/0a042cbb5275e6ff9a3a7c2748c74df6dcede09e. -- 467360160 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Extract calling_convention library from jitrt and move it to xla/runtime -- 467341954 by A. Unique TensorFlower<gardener@tensorflow.org>: Op documentation update. update of g3doc/_includes/tf_passes.md -- 467341426 by A. Unique TensorFlower<gardener@tensorflow.org>: Refactor SELECT_V2 in preparation for porting to TFLM. -- 467340678 by A. Unique TensorFlower<gardener@tensorflow.org>: Create some global stat tracking for CompilationEnvironments. This tracking can be used to help debug cases in which multiple CompilationEnvironments are used to compile a single HloModule (which should not happen). -- 467339870 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 467224197. 467339756 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/b20ec05d272477fa6223213687bb22145df92674. -- 467339529 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] Bugfix for gather index parallel partitioning where the sharded non-parallel dims in indices are not handled. -- 467337900 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] Minor renamings, refactorings, checks. -- 467337622 by A. Unique TensorFlower<gardener@tensorflow.org>: Remove unneeded dependency. -- 467337170 by A. Unique TensorFlower<gardener@tensorflow.org>: Integrate LLVM at llvm/llvm-project@2c3ca3b684bb Updates LLVM usage to match [2c3ca3b684bb](llvm/llvm-project@2c3ca3b684bb) -- 467335264 by A. Unique TensorFlower<gardener@tensorflow.org>: [SavedModel Fingerprinting] Add hash #5, which represents the checkpoint. The `checkpoint_hash` is a hash of the serialized .index file, which is the metadata file of the TensorBundle containing a string-string table of the name of a tensor to its serialized BundleEntryProto. The BundleEntryProto contains a crc32 hash of the tensor contents, but not the contents of the tensor itself. RFC: tensorflow/community#415 -- 467334010 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/76b3fea4cc9d5e7cb8a85798e41a61a55c301578. -- 467332094 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Extract executable library from jitrt and move it to xla/runtime -- 467324078 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467322782 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #57137: [oneDNN] Skip appending kernel registration to log message for MKL ops Imported from GitHub PR #57137 This PR skips printing kernel registrations for MKL ops since it leads to performance drop for some eager models caused by this commit c04f65d This is a temporary fix and the condition will be removed when support for block format is removed as a more permanent fix. Copybara import of the project: -- 89c4c20 by Kanvi Khanna <kanvi.khanna@intel.com>: [oneDNN] Skip appending kernel registration to log message for MKL ops Merging this change closes #57137 -- 467322425 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <memory> for 'std::unique_ptr' -- 467321561 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467321058 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #57089: [TF-TRT] Adjusting Conv2D Test Tolerance Imported from GitHub PR #57089 This PR adjusts & fixes the unittest tolerance for the test `Conv2DStridedNCHWTest` in INT8 mode. Copybara import of the project: -- 13e4bff by DEKHTIARJonathan <contact@jonathandekhtiar.eu>: [TF-TRT] Adjusting Conv2D Test Tolerance Merging this change closes #57089 -- 467320826 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #55804: [TF-TRT] Various Cleanups & Python Debugging Assertion Improvements Imported from GitHub PR #55804 This PR cleans a few spots in the code base, improves the debuggability of assertion messages in unittests. And replace `distutils.version.LooseVersion` (deprecated) with `packaging.version.Version` (new recommended API). Copybara import of the project: -- a4d15ef by DEKHTIARJonathan <contact@jonathandekhtiar.eu>: [TF-TRT] Various Cleanups & Python Debugging Assertion Improvements Merging this change closes #55804 -- 467320083 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <ostream> for 'std::ostream' -- 467319094 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467318151 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <iterator> for 'std::back_inserter' -- 467316931 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. using decl 'IsSubsetOf' is unused -- 467316097 by A. Unique TensorFlower<gardener@tensorflow.org>: Move passes under tensorflow/compiler/mlir/tensorflow/. -- 467315812 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <memory> for 'std::unique_ptr' -- 467314236 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <memory> for 'std::unique_ptr' -- 467313254 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' missing #include <memory> for 'std::make_unique' -- 467312293 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. using decl 'RangeSquareDataset' is unused -- 467311309 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467310637 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #57013: [TF-TRT] Add LogSoftmax Support for TF-TRT Imported from GitHub PR #57013 This PR adds TF-TRT support to `tf.nn.log_softmax` operation. This is performed using the formula `logsoftmax = logits - log(reduce_sum(exp(logits), axis=-1))` . The implemented TRT layers are fused into a single op. @DEKHTIARJonathan @tfeher : Please review the changes. Copybara import of the project: -- 1a8eb9a by Pavani Majety <pmajety@nvidia.com>: Add LogSoftmax conversion Fix Softmax comments [TF-TRT] Move LogSoftmax to use OpConverterBase Fix compiler errors clang-format Undo changes to convert_nodes.cc Fix comments Merging this change closes #57013 -- 467310335 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <array> for 'std::array' -- 467310313 by A. Unique TensorFlower<gardener@tensorflow.org>: Update test config in cross device ops -- 467309032 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/eba528ef667653c3554984e5c05573b152c9893b. -- 467308765 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467307702 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467306473 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467306092 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #56771: Add return_index_map argument in ssim() Imported from GitHub PR #56771 Closes #53115 Copybara import of the project: -- 8f5a1b1 by CohenAriel <ariel17112005@gmail.com>: Add return_index_map argument in ssim() Merging this change closes #56771 -- 467305190 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <unordered_map> for 'std::unordered_map' missing #include <vector> for 'std::vector' missing #include <memory> for 'std::shared_ptr' -- 467304747 by A. Unique TensorFlower<gardener@tensorflow.org>: [tfrt:jitrt] NFC: Remove Executable::KernelContext It was added before runtime::KernelContext and is not used anywhere. Remove it to avoid confusion. In the future we should reuse runtime::KernelContext as an extension point for user-defined memory allocation etc. -- 467303335 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467301808 by A. Unique TensorFlower<gardener@tensorflow.org>: Changes all local `State` or `TaskState` enum in coordination service into `CoordinatedTaskState` enum in proto. -- 467300580 by A. Unique TensorFlower<gardener@tensorflow.org>: lite: enable variable freezing in tf_tfl_translate tester -- 467298890 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/9bb23f7d1ee0e9a55d26c7168790667e5266a74c. -- 467292686 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Move execution_engine library from jitrt to xla/runtime -- 467280901 by A. Unique TensorFlower<gardener@tensorflow.org>: [GML] Add tests for concat in the GML tiling and fusion pipeline -- 467276349 by A. Unique TensorFlower<gardener@tensorflow.org>: [GML] Implement dim-based shape reification for concat -- 467273958 by A. Unique TensorFlower<gardener@tensorflow.org>: Change mutexes under stream_executor/gpu to use absl::Mutex and absl::MutexLock instead of tensorflow::mutex and tensorflow::mutex_lock. Change instance of absl::make_unique to std::make_unique -- 467272897 by A. Unique TensorFlower<gardener@tensorflow.org>: [tf.data] Prepend `/bufferedio/` for all paths passed to LoadDataset op. -- PiperOrigin-RevId: 467408627

Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>

On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () tensorflow#7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () tensorflow#8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () tensorflow#9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```

- add --allow_fp16 option for xnnpack to enable fp32 relax to fp16 vs640 cpu armv8.2 has fp extension, has big improvement when inference in fp16 - enable xnnpack quant optimization by default, but xnnpack is mainly used for fp model, sometimes inference performance is not good, just disable it by benchmark option: --use_xnnpack=false - enable xnnpack buffer transient indirection by default to reduce memory footprint of indirection buffers - enable xnnpack dynamic fully connected

FUTURE_COPYBARA_INTEGRATE_REVIEW=#62750 from mattbahr:implement-sampled-addmm-v2 c295a0e PiperOrigin-RevId: 630081768

PiperOrigin-RevId: 630156915

teamdandelion added the languages label Nov 9, 2015

vincentvanhoucke mentioned this issue Nov 10, 2015

do we have plans for java api? #92

Closed

astellato mentioned this issue Nov 18, 2015

Android App Crash On Start #280

Closed

davidzchen mentioned this issue Nov 26, 2015

Building a shared libary #108

Closed

DanielMao2015 mentioned this issue Mar 16, 2021

error about Kernel8bitNeonDotprodOutOfOrder occurs when running int8 CPU inference by tflite #47834

Closed

nmfisher mentioned this issue Mar 26, 2021

Can't load TFLite model on Android/iOS - NODE PAD failed to prepare #48108

Closed

limdlh mentioned this issue Apr 6, 2021

tensorflow-2.4.1 crash by pthread_mutex_lock while enable Hexagon #48325

Closed

dinkdeep mentioned this issue Apr 7, 2021

Segmentation fault in tf-opt while running a tf dialect mlir file #48365

Open

wang5566 mentioned this issue May 12, 2021

TensorRT Segmentation Fault During Conversion For Debug Mode #49139

Open

copybara-service bot pushed a commit that referenced this issue May 13, 2021

Merge pull request #5 from tensorflow/master

b3bd82f

Stay up to date

rsanthanam-amd mentioned this issue Jul 1, 2021

[ROCm] This change replaces the original assert for detecting multiple #49232

Closed

DavidvSon1 mentioned this issue Oct 3, 2021

Segmentation fault when invoking TFLite interpreter on basic quantized model tensorflow/model-optimization#857

Open

bhack mentioned this issue Nov 29, 2021

Separate dev baseline and cuda layers tensorflow/build#47

Closed

StewardH mentioned this issue Feb 13, 2022

A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x70 in tid 28007 #34313

Closed

goddie1 mentioned this issue Mar 10, 2022

tensorflow core while do stdthread create #55186

Closed

rbaranchuk-capgemini mentioned this issue Jan 9, 2023

Fix memory leaks in xla::BufferDonationTest #59033

Closed

polarbbb mentioned this issue Mar 17, 2023

About Android calling TensorFlow Lite crash problem #60030

Closed

iammeizu mentioned this issue May 10, 2023

tensorflow1.7 hangs at LocalMaster::RunStep with tf.train.MonitoredTrainingSession in sync mode #24338

Closed

ivankxt mentioned this issue Jun 12, 2023

Get deadlock after Predict(cuda10.0, cudnn7.6.5, Tesla T4 GPU) #60841

Closed

Rjasuja added a commit to anikulk/tensorflow that referenced this issue Sep 22, 2023

[WIP]: Attempt to add Add Node Commit:tensorflow#5

06822e1

Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>

Rjasuja added a commit to anikulk/tensorflow that referenced this issue Sep 25, 2023

[WIP]: Attempt to add Add Node Commit:tensorflow#5

2b49968

Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>

lyz1005 mentioned this issue Oct 26, 2023

Interpreter run crash #62240

Closed

CarloWood mentioned this issue Nov 11, 2023

cuDNN, cuFFT, and cuBLAS Errors #62075

Open

spacycoder mentioned this issue Dec 11, 2023

Why does my full integer quantized tflite model crash when loaded? #62618

Closed

copybara-service bot pushed a commit that referenced this issue May 2, 2024

[xla:cpu] NFC: Remove deprecated XLA:CPU mlir based codegen part #5

5318b3e

FUTURE_COPYBARA_INTEGRATE_REVIEW=#62750 from mattbahr:implement-sampled-addmm-v2 c295a0e PiperOrigin-RevId: 630081768

copybara-service bot mentioned this issue May 2, 2024

[xla:cpu] NFC: Remove deprecated XLA:CPU mlir based codegen part #5 #66867

Merged

copybara-service bot pushed a commit that referenced this issue May 2, 2024

[xla:cpu] NFC: Remove deprecated XLA:CPU mlir based codegen part #5

2458f14

FUTURE_COPYBARA_INTEGRATE_REVIEW=#62750 from mattbahr:implement-sampled-addmm-v2 c295a0e PiperOrigin-RevId: 630081768

copybara-service bot pushed a commit that referenced this issue May 2, 2024

[xla:cpu] NFC: Remove deprecated XLA:CPU mlir based codegen part #5

3313b7e

PiperOrigin-RevId: 630156915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java interface #5

Java interface #5

ravwojdyla commented Nov 9, 2015

girving commented Nov 9, 2015

girving commented Nov 9, 2015

pslam commented Nov 18, 2015

kylevedder commented Nov 22, 2015

bhack commented Nov 22, 2015

bhack commented Nov 22, 2015

bhack commented Nov 22, 2015

ravwojdyla commented Nov 22, 2015

saudet commented Nov 23, 2015

kylevedder commented Nov 23, 2015

tngan commented Nov 23, 2015

kylevedder commented Nov 23, 2015

davidzchen commented Nov 24, 2015

kylevedder commented Nov 24, 2015

saudet commented Nov 24, 2015

davidzchen commented Nov 24, 2015

saudet commented Nov 24, 2015

davidzchen commented Nov 24, 2015

kylevedder commented Nov 24, 2015

saudet commented Nov 24, 2015

davidzchen commented Nov 24, 2015

davidzchen commented Nov 24, 2015

saudet commented Nov 25, 2015

kylevedder commented Nov 25, 2015

davidzchen commented Nov 25, 2015

ivanseidel commented Nov 26, 2015

Java interface #5

Java interface #5

Comments

ravwojdyla commented Nov 9, 2015

girving commented Nov 9, 2015

girving commented Nov 9, 2015

pslam commented Nov 18, 2015

kylevedder commented Nov 22, 2015

bhack commented Nov 22, 2015

bhack commented Nov 22, 2015

bhack commented Nov 22, 2015

ravwojdyla commented Nov 22, 2015

saudet commented Nov 23, 2015

kylevedder commented Nov 23, 2015

tngan commented Nov 23, 2015

kylevedder commented Nov 23, 2015

davidzchen commented Nov 24, 2015

kylevedder commented Nov 24, 2015

saudet commented Nov 24, 2015

davidzchen commented Nov 24, 2015

saudet commented Nov 24, 2015

davidzchen commented Nov 24, 2015

kylevedder commented Nov 24, 2015

saudet commented Nov 24, 2015

davidzchen commented Nov 24, 2015

davidzchen commented Nov 24, 2015

saudet commented Nov 25, 2015

kylevedder commented Nov 25, 2015

davidzchen commented Nov 25, 2015

ivanseidel commented Nov 26, 2015