Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT Segmentation Fault During Conversion For Debug Mode #49139

Open
wang5566 opened this issue May 12, 2021 · 0 comments
Open

TensorRT Segmentation Fault During Conversion For Debug Mode #49139

wang5566 opened this issue May 12, 2021 · 0 comments
Assignees
Labels
comp:gpu:tensorrt Issues specific to TensorRT stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.4 for issues related to TF 2.4 type:bug Bug

Comments

@wang5566
Copy link

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 20.04
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 2.4.1
  • Python version: 3.8
  • Bazel version (if compiling from source): bazelisk
  • GCC/Compiler version (if compiling from source): gcc5
  • CUDA/cuDNN version: 11.0
  • GPU model and memory: Tesla 4

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior
Run the converter with my optimize_pass

Describe the expected behavior

In release mode, I get correct result but in Debug mode, I found segmentgraph(in segment.cc) got nullptr for output_edge

Other info / logs
[Thread 0x7fef967fc700 (LWP 51404) exited]
[Thread 0x7ff01e7fc700 (LWP 51369) exited]
[Thread 0x7ff01d7fa700 (LWP 51371) exited]
[Thread 0x7fef1ffff700 (LWP 51422) exited]
[Thread 0x7fef957fa700 (LWP 51406) exited]
[Thread 0x7fef96ffd700 (LWP 51403) exited]
[Thread 0x7ff01ffff700 (LWP 51366) exited]
[Thread 0x7ff016ffd700 (LWP 51375) exited]
[Thread 0x7ff0867fc700 (LWP 51362) exited]
[Thread 0x7fef1d7fa700 (LWP 51427) exited]
[Thread 0x7fef5dffb700 (LWP 51412) exited]
[Thread 0x7fef5e7fc700 (LWP 51411) exited]
[Thread 0x7fef5d7fa700 (LWP 51413) exited]
[Thread 0x7fef9ffff700 (LWP 51394) exited]
[Thread 0x7ff015ffb700 (LWP 51377) exited]
[Thread 0x7fef57fff700 (LWP 51415) exited]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ff3cbebf9d8 in tensorflow::Edge::dst (this=0x0) at external/org_tensorflow/tensorflow/core/graph/graph.h:384
384 external/org_tensorflow/tensorflow/core/graph/graph.h: No such file or directory.
(gdb) bt
#0 0x00007ff3cbebf9d8 in tensorflow::Edge::dst (this=0x0) at external/org_tensorflow/tensorflow/core/graph/graph.h:384
#1 0x00007ff3cc908839 in tensorflow::(anonymous namespace)::DFSFromHelpertensorflow::Node*(const tensorflow::Graph &, tensorflow::gtl::ArraySlice, const std::function<void(tensorflow::Node*)> &, const std::function<void(tensorflow::Node*)> &, const tensorflow::NodeComparator &, const tensorflow::EdgeFilter &) (g=..., start=..., enter=..., leave=..., stable_comparator=..., edge_filter=...)
at external/org_tensorflow/tensorflow/core/graph/algorithm.cc:81
#2 0x00007ff3cc90779a in tensorflow::DFS(tensorflow::Graph const&, std::function<void (tensorflow::Node*)> const&, std::function<void (tensorflow::Node*)> const&, std::function<bool (tensorflow::Node const*, tensorflow::Node const*)> const&, std::function<bool (tensorflow::Edge const&)> const&) (g=..., enter=..., leave=..., stable_comparator=..., edge_filter=...) at external/org_tensorflow/tensorflow/core/graph/algorithm.cc:93
#3 0x00007ff3cc907a53 in tensorflow::GetPostOrder(tensorflow::Graph const&, std::vector<tensorflow::Node*, std::allocatortensorflow::Node* >, std::function<bool (tensorflow::Node const, tensorflow::Node const*)> const&, std::function<bool (tensorflow::Edge const&)> const&) (g=...,
order=0x7ffcdfbe9850, stable_comparator=..., edge_filter=...) at external/org_tensorflow/tensorflow/core/graph/algorithm.cc:207
#4 0x00007ff3cbebc567 in tensorflow::tensorturbo::convert::ConvertAfterShapes (params=...) at convert/tt_convert_graph.cc:578
#5 0x00007ff3cbed5d50 in tensorflow::tensorturbo::convert::TTOptimizationPass::Optimize (this=0x7ffcdfbea910, cluster=0x0, item=...,
optimized_graph=0x7ffcdfbea6f0) at convert/tt_optimization_pass.cc:97
#6 0x00007ff3cbe6ca82 in <lambda(const pybind11::bytes&, bool, const string&)>::operator()(const pybind11::bytes &, bool, const std::__cxx11::string &) const (__closure=0x55986103aff8, serialized_metagraph=..., verbose=false, graph_id="graph_to_optimize")
at convert/tt_optimizer_wrapper.cc:63
#7 0x00007ff3cbe6d7a2 in pybind11::detail::argument_loader<pybind11::bytes const&, bool, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&>::call_impl<pybind11::bytes, pybind11_init__pywrap_tt_optimizer(pybind11::module&)::<lambda(const pybind11::bytes&, bool, const string&)>&, 0, 1, 2, pybind11::detail::void_type>(<lambda(const pybind11::bytes&, bool, const string&)> &, std::index_sequence, pybind11::detail::void_type &&) (this=0x7ffcdfbeab50, f=...)
at bazel-out/k8-dbg/bin/external/pybind11/_virtual_includes/pybind11/pybind11/cast.h:1935
#8 0x00007ff3cbe6d1b6 in pybind11::detail::argument_loader<pybind11::bytes const&, bool, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&>::call<pybind11::bytes, pybind11::detail::void_type, pybind11_init__pywrap_tt_optimizer(pybind11::module&)::<lambda(const pybind11::bytes&, bool, const string&)>&>(<lambda(const pybind11::bytes&, bool, const string&)> &) (this=0x7ffcdfbeab50, f=...)
at bazel-out/k8-dbg/bin/external/pybind11/_virtual_includes/pybind11/pybind11/cast.h:1912
#9 0x00007ff3cbe6cf01 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::operator()(pybind11::detail::function_call &) const (__closure=0x0, call=...) at bazel-out/k8-dbg/bin/external/pybind11/_virtual_includes/pybind11/pybind11/pybind11.h:159
#10 0x00007ff3cbe6cfae in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::_FUN(pybind11::detail::function_call &) ()
at bazel-out/k8-dbg/bin/external/pybind11/_virtual_includes/pybind11/pybind11/pybind11.h:137
#11 0x00007ff3cbe7724a in pybind11::cpp_function::dispatcher (self=0x7ff3cd9e6f30, args_in=0x7ff33425d640, kwargs_in=0x0)
at bazel-out/k8-dbg/bin/external/pybind11/_virtual_includes/pybind11/pybind11/pybind11.h:624
#12 0x000055985cdbff76 in cfunction_call_varargs (kwargs=, args=, func=0x7ff3cdcbfbd0)
at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:742

image

In release mode, here is not nullptr, otherwise in Debug mode, the nullptr causes core dump.
Why?

@wang5566 wang5566 added the type:bug Bug label May 12, 2021
@tilakrayal tilakrayal added TF 2.4 for issues related to TF 2.4 comp:lite-xnnpack TensorFlow Lite XNNPack related issues comp:gpu:tensorrt Issues specific to TensorRT and removed comp:lite-xnnpack TensorFlow Lite XNNPack related issues labels May 12, 2021
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu:tensorrt Issues specific to TensorRT stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.4 for issues related to TF 2.4 type:bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants