New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java interface #5
Comments
Nice! |
Moving this comment here from #3: There's a testsuite with fairly good converge, but currently it's mostly Python with a few C++ tests. There's also a lot of functionality for building graphs that's currently Python only, in particular the automatic differentiation functionality, though that doesn't matter for evaluation of graphs in Java. There are plans to move this functionality into the underlying C++ in future, at which point Java SWIG bindings would be more useful for creating graphs. If someone takes up the Java SWIG challenge, we'd be happy to accept it upstream pending review, etc., at which point it would be part of our continuous testing. The details of accepting contributions is in flux at the moment, but that will stabilize. |
Hello guys |
Hello, |
There are javacpp preset available for libiraries like Caffe and OpenCV. See also bytedeco/javacpp-presets#111. Java-cpp enable also IOS with RoboVM |
@girving Initial commit at bytedeco/javacpp-presets@374e1d5 |
/cc @saudet |
@pslam - I was able to work just a little bit on this - could definitely use some help! |
Hi guys, I believe I have pretty functional bindings for JavaCPP: https://github.com/bytedeco/javacpp-presets/tree/master/tensorflow. Let me know if you see anything that could be done with SWIG, but not JavaCPP. I could definitely use the feedback. (Thanks for the cc @bhack!) |
Very nicely done @saudet! I have almost finished a SWIG wrap, but it seems that your implementation works just as well. I do not see anything that my SWIG wrap can do that yours cannot do. JavaCPP seems very cool, I'll have to look into using it for future projects. |
Hi @kylevedder, have you resolved the issue related to |
@tngan Yes, that is what I discovered as well. Additionally, the The current roadblock which I am trying to solve is how to get ProtoBuff to recurse over the entire file tree and compile the |
@kylevedder Are your SWIG wrappers in a separate repository or are you working in the tensorflow repository? If you are working in a separate repository and using a different build system, then you would need to use the protobuf plugin for that build system. I'd be happy to help you set up the build if you would like. |
@davidzchen Thank you for the offer, any and all help is greatly appreciated. What I have thus far: I have already setup Bazel and gotten it to compile into a I have generated SWIG wrapper files in my forked repository. They are in a folder under What I am trying to do: Ultimately, my goal is to generate a You can see my attempt at a wrap script here to potentially get a better idea of what it is I am getting at, although thus far all of my attempts at using Finally, any feedback on areas of improvement would be greatly appreciated. Thanks! |
@kylevedder I already have an diff -ruN tensorflow/tensorflow/cc/BUILD tensorflow-patch/tensorflow/cc/BUILD
--- tensorflow/tensorflow/cc/BUILD 2015-11-22 00:00:02.441829192 +0900
+++ tensorflow-patch/tensorflow/cc/BUILD 2015-11-14 11:15:12.689330351 +0900
@@ -75,6 +75,17 @@
],
)
+cc_binary(
+ name = "libtensorflow.so",
+ copts = tf_copts(),
+ linkshared = 1,
+ deps = [
+ ":cc_ops",
+ "//tensorflow/core:kernels",
+ "//tensorflow/core:tensorflow",
+ ],
+)
+
filegroup(
name = "all_files",
srcs = glob( And run Bazel like this, for example:
AFAIK, this should gobble up pretty much anything of interest for the C++ API. |
@saudet Is there a reason why you are using a @kylevedder If your goal is to generate an If you need to use the TensorFlow protos in Java code, then you would need to add dependencies from your We still have a bit of work to do before we open-source the native A few other bits of feedback:
|
@davidzchen No reason in particular. I'm new to Bazel and just using |
@saudet Thanks! I was just checking to make sure that it wasn't an issue with Bazel. :) Feel free to let me know or open a bug if you run into any issues. |
@saudet Thanks for the info on using Bazel. I too am new to it and did not realize it was capable of generating a @davidzchen Thanks for the addendum about using a Additionally, I was not very clear in my previous comment about generating |
@davidzchen Hum, nope, |
@saudet I don't think you need to pass |
@kylevedder You won't be able to add the JNI headers that way since they're outside the workspace. However, Bazel includes the local JDK as a local repository and provides a number of built-in targets (see Bazel itself uses JNI and interfaces with the local JDK this way (see Here is a patch for your BUILD file that adds the
diff --git a/tensorflow/core/java/wrapper/BUILD b/tensorflow/core/java/wrapper/BUILD
index 72b4076..04a3394 100644
--- a/tensorflow/core/java/wrapper/BUILD
+++ b/tensorflow/core/java/wrapper/BUILD
@@ -7,10 +7,30 @@ exports_files(["LICENSE"])
load("/tensorflow/tensorflow", "tf_copts")
load("/tensorflow/tensorflow", "tf_gen_op_wrappers_cc")
+genrule(
+ name = "copy_link_jni_md_header",
+ srcs = ["//external:jni_md_header-linux"],
+ outs = ["jni_md.h"],
+ cmd = "cp -f $< $@",
+)
+
+genrule(
+ name = "copy_link_jni_header",
+ srcs = ["//external:jni_header"],
+ outs = ["jni.h"],
+ cmd = "cp -f $< $@",
+)
+
cc_library(
name = "java_wrapper",
- srcs = glob(["*.cc","*.cxx","*.h"]),
- copts = ["-I$$JAVA_HOME/include/", "-I$$JAVA_HOME/include/linux/"],
+ srcs = glob(["*.cc", "*.cxx", "*.h"]) + [
+ ":jni.h",
+ ":jni_md.h",
+ ],
+ includes = ["."],
+ deps = [
+ "//tensorflow/core",
+ ],
visibility = ["//visibility:public"],
) Note that in general, compile actions in Bazel are run from the root of the source tree, and you would need to change the includes in your SWIG file as follows and then re-generate the C++ files so that they will have the correct includes as well: diff --git a/tensorflow/core/java/wrapper/tensor_c_api.i b/tensorflow/core/java/wrapper/tensor_c_api.i
index d08b571..9ab1fa1 100644
--- a/tensorflow/core/java/wrapper/tensor_c_api.i
+++ b/tensorflow/core/java/wrapper/tensor_c_api.i
@@ -1,8 +1,8 @@
%module tensor_c_api_module
%{
-#include "../../public/tensor_c_api.h"
+#include "tensorflow/core/public/tensor_c_api.h"
%}
-%include "../../public/tensor_c_api.h"
+%include "tensorflow/core/public/tensor_c_api.h"
%include "stddef.h" Once this works, you would have the JNI build set up for Linux since the
genrule(
name = "copy_link_jni_md_header",
srcs = select({
"//tensorflow/core:darwin": ["//external:jni_md_header-darwin"],
"//tensorflow/core:darwin_x86_64": ["//external:jni_md_header-darwin"],
"//tensorflow/core:freebsd": ["//external:jni_md_header-freebsd"],
"//conditions:default": ["//external:jni_md_header-linux"],
}),
outs = ["jni_md.h"],
cmd = "cp -f $< $@",
) I'd be happy to help you with this if you run into any issues. Let me know if this works for you. |
@davidzchen cc_library generates a bunch of .a files, but no .so file. I'm using 0.1.0 as was previously recommended for TensorFlow... Maybe it's fixed in 0.1.1? I'll have to try again. |
@davidzchen Thank you very much for your help. I have followed your instructions and updated both the Java wrapper For now, I have skipped the generalization the Once again, thank you for all of your help, I really appreciate it. |
Sorry, it turns out I was wrong. In order to build a
The main difference between the I apologize for the confusion. Perhaps we should improve our documentation and add an example on building |
I guess you should follow those steps to build Tensorflow and all it's dependencies. We are working on porting TensorFlow to node.js, and I've implemented a shell script to compile and getter only essential sources from the whole repo: |
It makes analyzer output more useful. example) Your TFLite model has ‘1’ subgraph(s). In the subgraph description below, T# represents the Tensor numbers. For example, in Subgraph#0, the RESHAPE op takes tensor #0 and tensor #1 as input and produces tensor #5 as output. Subgraph#0 main(T#0) -> [T#9] Op#0 RESHAPE(T#0, T#1) -> [T#5] Op#1 STRIDED_SLICE(T#5, T#2, T#2, T#3) -> [T#6] Op#2 RESIZE_BILINEAR(T#6, T#4) -> [T#7] Op#3 RESIZE_BILINEAR(T#6, T#4) -> [T#8] Op#4 ADD(T#7, T#8) -> [T#9] Tensors of Subgraph#0 T#0(image) shape:[5, 5], type:FLOAT32 T#1(strided_slice) shape:[4], type:INT32 T#2(strided_slice1) shape:[4], type:INT32 T#3(strided_slice2) shape:[4], type:INT32 T#4(ResizeBilinear/size) shape:[2], type:INT32 T#5(strided_slice3) shape:[1, 5, 1, 5], type:FLOAT32 T#6(strided_slice4) shape:[1, 5, 5, 1], type:FLOAT32 T#7(ResizeBilinear) shape:[1, 2, 2, 1], type:FLOAT32 T#8(ResizeBilinear_1) shape:[1, 2, 2, 1], type:FLOAT32 T#9(Identity) shape:[1, 2, 2, 1], type:FLOAT32 PiperOrigin-RevId: 389795468 Change-Id: I0fda5bb74568c68459359a8a39f1627b459b7a4b
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () #7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () #8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () #9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () #10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () #11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```
* avoid autotune for cus type * support cus matmul with cutlass using float32 as computation type * add cus support for more ops * add stream and scratchmemory to cutlass conv function interface * avoid recompilation when changing env vars * temporarily disabled GpuConvAlgorithmPicker * add hlo compare for cus * emulating fp16, add forceinline
467408627 by A. Unique TensorFlower<gardener@tensorflow.org>: Update sqlite version in TF -- 467380418 by A. Unique TensorFlower<gardener@tensorflow.org>: compat: Update forward compatibility horizon to 2022-08-13 -- 467378663 by A. Unique TensorFlower<gardener@tensorflow.org>: Update GraphDef version to 1222. -- 467363891 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/b750bc2999cf02abca6ad9eeff0a04ec7bf3b683. -- 467363622 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Move constraints documentation from jitrt to xla/runtime/constraints -- 467362586 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Extract JitCompilationContext library from jitrt and move it to xla/runtime -- 467361314 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/0a042cbb5275e6ff9a3a7c2748c74df6dcede09e. -- 467360160 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Extract calling_convention library from jitrt and move it to xla/runtime -- 467341954 by A. Unique TensorFlower<gardener@tensorflow.org>: Op documentation update. update of g3doc/_includes/tf_passes.md -- 467341426 by A. Unique TensorFlower<gardener@tensorflow.org>: Refactor SELECT_V2 in preparation for porting to TFLM. -- 467340678 by A. Unique TensorFlower<gardener@tensorflow.org>: Create some global stat tracking for CompilationEnvironments. This tracking can be used to help debug cases in which multiple CompilationEnvironments are used to compile a single HloModule (which should not happen). -- 467339870 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 467224197. 467339756 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/b20ec05d272477fa6223213687bb22145df92674. -- 467339529 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] Bugfix for gather index parallel partitioning where the sharded non-parallel dims in indices are not handled. -- 467337900 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] Minor renamings, refactorings, checks. -- 467337622 by A. Unique TensorFlower<gardener@tensorflow.org>: Remove unneeded dependency. -- 467337170 by A. Unique TensorFlower<gardener@tensorflow.org>: Integrate LLVM at llvm/llvm-project@2c3ca3b684bb Updates LLVM usage to match [2c3ca3b684bb](llvm/llvm-project@2c3ca3b684bb) -- 467335264 by A. Unique TensorFlower<gardener@tensorflow.org>: [SavedModel Fingerprinting] Add hash #5, which represents the checkpoint. The `checkpoint_hash` is a hash of the serialized .index file, which is the metadata file of the TensorBundle containing a string-string table of the name of a tensor to its serialized BundleEntryProto. The BundleEntryProto contains a crc32 hash of the tensor contents, but not the contents of the tensor itself. RFC: tensorflow/community#415 -- 467334010 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/76b3fea4cc9d5e7cb8a85798e41a61a55c301578. -- 467332094 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Extract executable library from jitrt and move it to xla/runtime -- 467324078 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467322782 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #57137: [oneDNN] Skip appending kernel registration to log message for MKL ops Imported from GitHub PR #57137 This PR skips printing kernel registrations for MKL ops since it leads to performance drop for some eager models caused by this commit c04f65d This is a temporary fix and the condition will be removed when support for block format is removed as a more permanent fix. Copybara import of the project: -- 89c4c20 by Kanvi Khanna <kanvi.khanna@intel.com>: [oneDNN] Skip appending kernel registration to log message for MKL ops Merging this change closes #57137 -- 467322425 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <memory> for 'std::unique_ptr' -- 467321561 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467321058 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #57089: [TF-TRT] Adjusting Conv2D Test Tolerance Imported from GitHub PR #57089 This PR adjusts & fixes the unittest tolerance for the test `Conv2DStridedNCHWTest` in INT8 mode. Copybara import of the project: -- 13e4bff by DEKHTIARJonathan <contact@jonathandekhtiar.eu>: [TF-TRT] Adjusting Conv2D Test Tolerance Merging this change closes #57089 -- 467320826 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #55804: [TF-TRT] Various Cleanups & Python Debugging Assertion Improvements Imported from GitHub PR #55804 This PR cleans a few spots in the code base, improves the debuggability of assertion messages in unittests. And replace `distutils.version.LooseVersion` (deprecated) with `packaging.version.Version` (new recommended API). Copybara import of the project: -- a4d15ef by DEKHTIARJonathan <contact@jonathandekhtiar.eu>: [TF-TRT] Various Cleanups & Python Debugging Assertion Improvements Merging this change closes #55804 -- 467320083 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <ostream> for 'std::ostream' -- 467319094 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467318151 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <iterator> for 'std::back_inserter' -- 467316931 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. using decl 'IsSubsetOf' is unused -- 467316097 by A. Unique TensorFlower<gardener@tensorflow.org>: Move passes under tensorflow/compiler/mlir/tensorflow/. -- 467315812 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <memory> for 'std::unique_ptr' -- 467314236 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <memory> for 'std::unique_ptr' -- 467313254 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' missing #include <memory> for 'std::make_unique' -- 467312293 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. using decl 'RangeSquareDataset' is unused -- 467311309 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467310637 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #57013: [TF-TRT] Add LogSoftmax Support for TF-TRT Imported from GitHub PR #57013 This PR adds TF-TRT support to `tf.nn.log_softmax` operation. This is performed using the formula `logsoftmax = logits - log(reduce_sum(exp(logits), axis=-1))` . The implemented TRT layers are fused into a single op. @DEKHTIARJonathan @tfeher : Please review the changes. Copybara import of the project: -- 1a8eb9a by Pavani Majety <pmajety@nvidia.com>: Add LogSoftmax conversion Fix Softmax comments [TF-TRT] Move LogSoftmax to use OpConverterBase Fix compiler errors clang-format Undo changes to convert_nodes.cc Fix comments Merging this change closes #57013 -- 467310335 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <array> for 'std::array' -- 467310313 by A. Unique TensorFlower<gardener@tensorflow.org>: Update test config in cross device ops -- 467309032 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/eba528ef667653c3554984e5c05573b152c9893b. -- 467308765 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467307702 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <vector> for 'std::vector' -- 467306473 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467306092 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #56771: Add return_index_map argument in ssim() Imported from GitHub PR #56771 Closes #53115 Copybara import of the project: -- 8f5a1b1 by CohenAriel <ariel17112005@gmail.com>: Add return_index_map argument in ssim() Merging this change closes #56771 -- 467305190 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. missing #include <unordered_map> for 'std::unordered_map' missing #include <vector> for 'std::vector' missing #include <memory> for 'std::shared_ptr' -- 467304747 by A. Unique TensorFlower<gardener@tensorflow.org>: [tfrt:jitrt] NFC: Remove Executable::KernelContext It was added before runtime::KernelContext and is not used anywhere. Remove it to avoid confusion. In the future we should reuse runtime::KernelContext as an extension point for user-defined memory allocation etc. -- 467303335 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service #codehealth Clean up clang-tidy report. 'int64' is deprecated: Use int64_t instead. -- 467301808 by A. Unique TensorFlower<gardener@tensorflow.org>: Changes all local `State` or `TaskState` enum in coordination service into `CoordinatedTaskState` enum in proto. -- 467300580 by A. Unique TensorFlower<gardener@tensorflow.org>: lite: enable variable freezing in tf_tfl_translate tester -- 467298890 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/9bb23f7d1ee0e9a55d26c7168790667e5266a74c. -- 467292686 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla:runtime] NFC: Move execution_engine library from jitrt to xla/runtime -- 467280901 by A. Unique TensorFlower<gardener@tensorflow.org>: [GML] Add tests for concat in the GML tiling and fusion pipeline -- 467276349 by A. Unique TensorFlower<gardener@tensorflow.org>: [GML] Implement dim-based shape reification for concat -- 467273958 by A. Unique TensorFlower<gardener@tensorflow.org>: Change mutexes under stream_executor/gpu to use absl::Mutex and absl::MutexLock instead of tensorflow::mutex and tensorflow::mutex_lock. Change instance of absl::make_unique to std::make_unique -- 467272897 by A. Unique TensorFlower<gardener@tensorflow.org>: [tf.data] Prepend `/bufferedio/` for all paths passed to LoadDataset op. -- PiperOrigin-RevId: 467408627
Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>
Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () tensorflow#7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () tensorflow#8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () tensorflow#9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```
- add --allow_fp16 option for xnnpack to enable fp32 relax to fp16 vs640 cpu armv8.2 has fp extension, has big improvement when inference in fp16 - enable xnnpack quant optimization by default, but xnnpack is mainly used for fp model, sometimes inference performance is not good, just disable it by benchmark option: --use_xnnpack=false - enable xnnpack buffer transient indirection by default to reduce memory footprint of indirection buffers - enable xnnpack dynamic fully connected
PiperOrigin-RevId: 630156915
Issue to trace effort of swig interface for java. Started implementation - will update with progress. If anyone has any comments/tips - please feel welcome to join the discussion!
The text was updated successfully, but these errors were encountered: