Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow is taking over my openssl and causing segfaults #417

Open
msdrigg opened this issue Sep 27, 2023 · 3 comments
Open

Tensorflow is taking over my openssl and causing segfaults #417

msdrigg opened this issue Sep 27, 2023 · 3 comments

Comments

@msdrigg
Copy link

msdrigg commented Sep 27, 2023

So I recently added tensorflow to a rust project that had an external openssl dependency (reqwests and paho-mqtt) and I immediately started seeing segfaults. The strange thing is that these segfaults are coming from crypto functions being called in the tensorflow_framework.so.2 library from from paho-mqtt (SSLSocket_initialize in the core dump shown below). If I remove the paho-mqtt dependency on ssl, I see similar things with reqwests

Relevant Logs

This backtrace reliably occurs everytime I run my program.

(gdb) bt
#0  __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x0)
    at ./nptl/pthread_rwlock_common.c:603
#1  ___pthread_rwlock_wrlock (rwlock=0x0) at ./nptl/pthread_rwlock_wrlock.c:26
#2  0x00007f8ec0e6db69 in CRYPTO_STATIC_MUTEX_lock_write ()
   from /home/myuser/workspace/target/debug/build/tensorflow-sys-b3a831e1f8b18f5e/out/libtensorflow_framework.so.2
#3  0x00007f8ec0df6263 in CRYPTO_get_ex_new_index ()
   from /home/myuser/workspace/target/debug/build/tensorflow-sys-b3a831e1f8b18f5e/out/libtensorflow_framework.so.2
#4  0x0000564ee8a50b43 in SSLSocket_initialize ()
    at /home/myuser/.cargo/registry/src/index.crates.io-6f17d22bba15001f/paho-mqtt-sys-0.8.1/paho.mqtt.c/src/SSLSocket.c:492
#5  0x0000564ee8a440ff in MQTTAsync_createWithOptions (handle=0x7f8ea4bdfe00, 
    serverURI=0x7f8df4004fc0 "tcp://localhost:1883", 
    clientId=0x7f8df4004fe0 "program", persistence_type=1, 
    persistence_context=0x0, options=0x7f8ea4bdfcc8)
    at /home/myuser/.cargo/registry/src/index.crates.io-6f17d22bba15001f/paho-mqtt-sys-0.8.1/paho.mqtt.c/src/MQTTAsync.c:372
#6  0x0000564ee8a22c37 in paho_mqtt::async_client::AsyncClient::new<paho_mqtt::create_options::CreateOptions> (opts=...) at src/async_client.rs:201
#7  0x0000564ee8a2127a in paho_mqtt::create_options::CreateOptionsBuilder::create_client (self=...)
    at src/create_options.rs:444

Interestingly, here's what I see from ldd. Note that libssl.so.3 does correctly point to the real openssl, so I don't know why at runtime it gets linked to tensorflow_framework.so.2

$ldd target/debug/program
        linux-vdso.so.1 (0x00007ffc46ffe000)
        libtensorflow_framework.so.2 => /usr/local/lib/libtensorflow_framework.so.2 (0x00007fb1b0000000)
        libtensorflow.so.2 => /usr/local/lib/libtensorflow.so.2 (0x00007fb19f000000)
        libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007fb1b767e000)
        libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007fb19ea00000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb1b765e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb19ef19000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb19e600000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb1b773c000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb19e200000)

Note: I am using the latest rust versions and the latest versions of all packages mentioned here. Here's what my uname -a output looks like:

Linux pop-os 6.4.6-76060406-generic #202307241739~1690928105~22.04~d567a38 SMP PREEMPT_DYNAMIC Tue A x86_64 x86_64 x86_64 GNU/Linux

Prior Art

The only other mention of this issue I could find was here tensorflow/tensorflow#34742, and I am currently trying to resolve my problem using the steps outlined in that issue.

Goals

A perfect fix would be for me to be able to seamlessly use tensorflow and openssl in a project without any tweaks, but I would consider this issue closed for me if we could find some workaround (environmental variables, build script or something similar) so that I could make my project run without segfaulting.

@msdrigg
Copy link
Author

msdrigg commented Sep 27, 2023

I tried all solutions mentioned in tensorflow/tensorflow#34742, and nothing works. My final attempt was bazel build --compilation_mode=opt --jobs=25 --config=noaws --config=nogcp --config=nohdfs --config=nonccl --config=monolithic tensorflow and it still did not solve the problem.

@adamcrume
Copy link
Contributor

Are you pointing Rust to the TensorFlow library you built? There are instructions on how to do that at https://github.com/tensorflow/rust/blob/master/tensorflow-sys/README.md#manual-tensorflow-compilation.

@msdrigg
Copy link
Author

msdrigg commented Oct 5, 2023

Yes, I moved the compiled objects into /usr/local/lib and ran ldconfig on the directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants