Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] A thread is stuck waiting to close an SRT socket due to an internal SRT mutex lock. #2944

Open
tvaknin opened this issue Apr 25, 2024 · 13 comments
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@tvaknin
Copy link

tvaknin commented Apr 25, 2024

Hi everyone,
Potential issue: A thread is stuck waiting to close an SRT socket due to an internal SRT mutex lock.

The Backtrace of the thread being blocked:

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f6ad3751411 in __GI___pthread_mutex_lock (mutex=0x7f6ad4c1e480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f6ad4b53b33 in srt::CUDTUnited::locateSocket(int, srt::CUDTUnited::ErrorHandling) () from ../libs/libsrt.so.1.5 #3 0x00007f6ad4b5bc58 in srt::CUDTUnited::close(int) () from ../libs/libsrt.so.1.5 #4 0x00007f6ad4b5bcec in srt::CUDT::close(int) () from ../libs/libsrt.so.1.5 #5 0x00000000004c1e78 in SrtConnectionManager::CheckForConnection (this=0x10ecea0 <SrtConnectionManager::GetInstance()::inst>) at muxer/SrtConnectionManager.cpp:244

I have a few more threads ~6 that are also stuck on this Mutex. e.g.

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f954b9e2b33 in srt::CUDTUnited::locateSocket(int, srt::CUDTUnited::ErrorHandling) () from../libs/libsrt.so.1.5 #3 0x00007f954b9e623c in srt::CUDT::sendmsg2(int, char const*, int, SRT_MsgCtrl_&) () from ../libs/libsrt.so.1.5 #4 0x00007f954b9e629f in srt::CUDT::send(int, char const*, int, int) () from ../libs/libsrt.so.1.5 #5 0x00000000004dad3d in OutSrtSocket::Send (successCnt=0x2f8d484, this=0x7f950194eeb0) at Revioly/OutSrtSocket.cpp:32

I also found that the SRT thread "SRT:GC" holds it too:
41 Thread 0x7f94feffd700 (LWP 32188) "SRT:GC" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f954b9ed00e in srt::CUDTUnited::checkBrokenSockets() () from ../libs/libsrt.so.1.5 #3 0x00007f954b9ed6a8 in srt::CUDTUnited::garbageCollect(void*) () from ../libs/libsrt.so.1.5 #4 0x00007f954a5ddefc in start_thread (arg=<optimized out>) at pthread_create.c:479 #5 0x00007f9549ec122f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

SRT internal sockets thread example:

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f954b9ebfb8 in srt::CUDTUnited::newConnection(int, srt::sockaddr_any const&, srt::CPacket const&, srt::CHandShake&, int&, srt::CUDT*&) () from ../libs/libsrt.so.1.5 #3 0x00007f954ba1948d in srt::CUDT::processConnectRequest(srt::sockaddr_any const&, srt::CPacket&) () from ../libs/libsrt.so.1.5 #4 0x00007f954ba60156 in srt::CRcvQueue::worker_ProcessConnectionRequest(srt::CUnit*, srt::sockaddr_any const&) () from ../libs/libsrt.so.1.5 #5 0x00007f954ba60ec4 in srt::CRcvQueue::worker(void*) () from ../libs/libsrt.so.1.5 #6 0x00007f954a5ddefc in start_thread (arg=<optimized out>) at pthread_create.c:479 #7 0x00007f9549ec122f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

To Reproduce
Steps to reproduce the behavior:

  1. No problem in SRT versions 1.5.1 or 1.5.2 (I didn't check 1.5.2) But, I found the issue in version 1.5.3.

  2. My setup has 70 SRT sockets (Senders - listeners), setting the SRT latency to 6000ms.
    The trouble starts in the Sender-Listener device right after the downstream device (Receiver-Caller) restarts or loses connection.

Desktop

  • OS: Linux
  • SRT Version / 1.5.3:
@tvaknin tvaknin added the Type: Bug Indicates an unexpected problem or unintended behavior label Apr 25, 2024
@tvaknin tvaknin changed the title [BUG] [BUG] A thread is stuck waiting to close an SRT socket due to an internal SRT mutex lock. Apr 25, 2024
@maxsharabayko maxsharabayko added this to the v1.5.4 milestone Apr 25, 2024
@maxsharabayko maxsharabayko added the [core] Area: Changes in SRT library core label Apr 25, 2024
@maxsharabayko
Copy link
Collaborator

From the provided stack traces it is not clear who holds the m_GlobControlLock. All the threads are trying to lock it. But there should be a thread that is holding it, and is probably trying to lock some other socket, that has been already locked. My gut feeling is that the other lock is m_LSLock. Maybe there is one more thread trying to close a socket, locking this mutex.
CUDTUnited::checkBrokenSockets() locks m_GlobControlLock and eventually calls CUDTUnited::removeSocket(..) -> CUDT::closeInternal() -> m_pRcvQueue->removeListener(this); -> lock m_LSLock. For example here the order would be the opposite

#2  0x00007f954b9ebfb8 in srt::CUDTUnited::newConnection(..)  -> lock m_GlobControlLock
#3  0x00007f954ba1948d in srt::CUDT::processConnectRequest(srt::sockaddr_any const&, srt::CPacket&)
#4  0x00007f954ba60156 in srt::CRcvQueue::worker_ProcessConnectionRequest(..) -> lock m_LSLock
#5  0x00007f954ba60ec4 in srt::CRcvQueue::worker(void*) () from ../libs/libsrt.so.1.5 

But from the state of the GC thread that above described case is not the one happening here.

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f6ad3751411 in __GI___pthread_mutex_lock (mutex=0x7f6ad4c1e480 <srt::CUDT::uglobal()::instance+96>)
#2  0x00007f6ad4b53b33 in srt::CUDTUnited::locateSocket(int, srt::CUDTUnited::ErrorHandling) -> lock m_GlobControlLock
#3  0x00007f6ad4b5bc58 in srt::CUDTUnited::close(int) () from ../libs/libsrt.so.1.5
#4  0x00007f6ad4b5bcec in srt::CUDT::close(int) () from ../libs/libsrt.so.1.5
#5  0x00000000004c1e78 in SrtConnectionManager::CheckForConnection (this=0x10ecea0 <SrtConnectionManager::GetInstance()::inst>)
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>)
#2  0x00007f954b9e2b33 in srt::CUDTUnited::locateSocket(int, srt::CUDTUnited::ErrorHandling) -> lock m_GlobControlLock
#3  0x00007f954b9e623c in srt::CUDT::sendmsg2(int, char const*, int, SRT_MsgCtrl_&) ()
#4  0x00007f954b9e629f in srt::CUDT::send(int, char const*, int, int)
#5  0x00000000004dad3d in OutSrtSocket::Send (successCnt=0x2f8d484, this=0x7f950194eeb0)
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 
#1  0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>)
#2  0x00007f954b9ed00e in srt::CUDTUnited::checkBrokenSockets() () from ../libs/libsrt.so.1.5  -> lock m_GlobControlLock
#3  0x00007f954b9ed6a8 in srt::CUDTUnited::garbageCollect(void*) () -> lock m_GCStopLock
#4  0x00007f954a5ddefc in start_thread (arg=<optimized out>) at pthread_create.c:479 
#5  0x00007f9549ec122f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 
#1  0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>)
#2  0x00007f954b9ebfb8 in srt::CUDTUnited::newConnection(..)  -> lock m_GlobControlLock
#3  0x00007f954ba1948d in srt::CUDT::processConnectRequest(srt::sockaddr_any const&, srt::CPacket&)
#4  0x00007f954ba60156 in srt::CRcvQueue::worker_ProcessConnectionRequest(..) -> lock m_LSLock
#5  0x00007f954ba60ec4 in srt::CRcvQueue::worker(void*) () from ../libs/libsrt.so.1.5 
#6  0x00007f954a5ddefc in start_thread (arg=<optimized out>) at pthread_create.c:479 
#7  0x00007f9549ec122f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@ethouris
Copy link
Collaborator

ethouris commented May 7, 2024

Could this problem be reproduced with having #2893 added?

@maxsharabayko
Copy link
Collaborator

Thread sanitizer shows a potential deadlock around those functions:

  1. GC thread violates the mutex lock order by locking m_GlobControlLock (checkBrokenSockets()) first, and then m_ConnectionLock (CUDT::closeInternal()).
  2. The CRcvQueue::worker thread locks m_LSLock first, then m_GlobControlLock. The order also does not look correct.
  3. The CUDTUnited::listen function locks m_ConnectionLock, then m_LSLock.

Full Threat Report

Click to expand
==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=2531)
  Cycle in lock order graph: M0 (0x7ba800005160) => M1 (0x7b4800000048) => M2 (0x5594096b2408) => M0
  M0 m_ConnectionLock
  M1 m_LSLock
  M2 m_GlobControlLock

  Mutex M1 acquired here while holding mutex M0 in main thread: m_ConnectionLock -> m_LSLock
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4481 (libtsan.so.2+0x4ea9f) (BuildId: 5e151fac359cd43a07192270fb85f74e380f2fc8)
    #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749 (srt-xtransmit+0xded4f) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #2 std::mutex::lock() /usr/include/c++/12/bits/std_mutex.h:100 (srt-xtransmit+0xdedd8) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/include/c++/12/bits/std_mutex.h:229 (srt-xtransmit+0xdfcfa) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #4 srt::CRcvQueue::setListener(srt::CUDT*) /srtcore/queue.cpp:1693 (srt-xtransmit+0x3277ad) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #5 srt::CUDT::setListenState() /srtcore/core.cpp:1018 (srt-xtransmit+0x29a94c) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #6 srt::CUDTUnited::listen(int, int) /srtcore/api.cpp:1012 (srt-xtransmit+0x25254f) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #7 srt::CUDT::listen(int, int) /srtcore/api.cpp:3571 (srt-xtransmit+0x25aea4) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #8 srt_listen /srtcore/srt_c_api.cpp:114 (srt-xtransmit+0x33f0bd) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #9 xtransmit::socket::srt::listen() /mnt/d/Projects/srt/srt-xtransmit/xtransmit/srt_socket.cpp:99 (srt-xtransmit+0x10b9a1) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #10 xtransmit::create_connection(std::vector<UriParser, std::allocator<UriParser> > const&, std::shared_ptr<xtransmit::socket::isocket>&) /mnt/d/Projects/srt/srt-xtransmit/xtransmit/misc.cpp:63 (srt-xtransmit+0xd5698) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #11 xtransmit::common_run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, xtransmit::stats_config const&, bool, bool, std::atomic<bool> const&, std::function<void (std::shared_ptr<xtransmit::socket::isocket>, std::atomic<bool> const&)>&) /mnt/d/Projects/srt/srt-xtransmit/xtransmit/misc.cpp:132 (srt-xtransmit+0xd5c98) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #12 xtransmit::receive::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, xtransmit::receive::config const&, std::atomic<bool> const&) /mnt/d/Projects/srt/srt-xtransmit/xtransmit/receive.cpp:179 (srt-xtransmit+0xdd471) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #13 main /mnt/d/Projects/srt/srt-xtransmit/xtransmit/xtransmit-app.cpp:231 (srt-xtransmit+0x142732) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)

    Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message

  Mutex M2 acquired here while holding mutex M1 in thread T3: m_LSLock -> m_GlobControlLock
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4481 (libtsan.so.2+0x4ea9f) (BuildId: 5e151fac359cd43a07192270fb85f74e380f2fc8)
    #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749 (srt-xtransmit+0xded4f) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #2 std::mutex::lock() /usr/include/c++/12/bits/std_mutex.h:100 (srt-xtransmit+0xdedd8) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/include/c++/12/bits/std_mutex.h:229 (srt-xtransmit+0xdfcfa) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #4 srt::CUDTUnited::locateSocket(int, srt::CUDTUnited::ErrorHandling) /srtcore/api.cpp:2488 (srt-xtransmit+0x24e1d6) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #5 srt::CUDTUnited::newConnection(int, srt::sockaddr_any const&, srt::CPacket const&, srt::CHandShake&, int&, srt::CUDT*&) /srtcore/api.cpp:486 (srt-xtransmit+0x24b8c4) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #6 srt::CUDT::processConnectRequest(srt::sockaddr_any const&, srt::CPacket&) /srtcore/core.cpp:11101 (srt-xtransmit+0x2cfe39) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #7 srt::CRcvQueue::worker_ProcessConnectionRequest(srt::CUnit*, srt::sockaddr_any const&) /srtcore/queue.cpp:1411 (srt-xtransmit+0x325f6e) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #8 srt::CRcvQueue::worker(void*) /srtcore/queue.cpp:1247 (srt-xtransmit+0x324fa5) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #9 void* std::__invoke_impl<void*, void* (*)(void*), void*>(std::__invoke_other, void* (*&&)(void*), void*&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61 (srt-xtransmit+0x3444aa) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #10 std::__invoke_result<void* (*)(void*), void*>::type std::__invoke<void* (*)(void*), void*>(void* (*&&)(void*), void*&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:96 (srt-xtransmit+0x34435a) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #11 void* std::thread::_Invoker<std::tuple<void* (*)(void*), void*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:279 (srt-xtransmit+0x3442e2) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #12 std::thread::_Invoker<std::tuple<void* (*)(void*), void*> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:286 (srt-xtransmit+0x344265) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void* (*)(void*), void*> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:231 (srt-xtransmit+0x343fa9) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #14 <null> <null> (libstdc++.so.6+0xe62b2) (BuildId: 2db998bd67acbfb235c464c0275d4070061695fb)

  Mutex M0 acquired here while holding mutex M2 in thread T1: m_GlobControlLock -> m_ConnectionLock
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4481 (libtsan.so.2+0x4ea9f) (BuildId: 5e151fac359cd43a07192270fb85f74e380f2fc8)
    #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749 (srt-xtransmit+0xded4f) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #2 std::mutex::lock() /usr/include/c++/12/bits/std_mutex.h:100 (srt-xtransmit+0xdedd8) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/include/c++/12/bits/std_mutex.h:229 (srt-xtransmit+0xdfcfa) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #4 srt::CUDT::closeInternal() srtcore/core.cpp:6255 locks m_ConnectionLock
    #5 srt::CUDTUnited::removeSocket(int) /srtcore/api.cpp:2789 (srt-xtransmit+0x259389) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #6 srt::CUDTUnited::checkBrokenSockets() srtcore/api.cpp:2705 locks m_GlobControlLock
    #7 srt::CUDTUnited::garbageCollect(void*) /srtcore/api.cpp:3339 (srt-xtransmit+0x24abb2) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #8 void* std::__invoke_impl<void*, void* (*)(void*), void*>(std::__invoke_other, void* (*&&)(void*), void*&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61 (srt-xtransmit+0x3444aa) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #9 std::__invoke_result<void* (*)(void*), void*>::type std::__invoke<void* (*)(void*), void*>(void* (*&&)(void*), void*&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:96 (srt-xtransmit+0x34435a) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #10 void* std::thread::_Invoker<std::tuple<void* (*)(void*), void*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:279 (srt-xtransmit+0x3442e2) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #11 std::thread::_Invoker<std::tuple<void* (*)(void*), void*> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:286 (srt-xtransmit+0x344265) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void* (*)(void*), void*> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:231 (srt-xtransmit+0x343fa9) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #13 <null> <null> (libstdc++.so.6+0xe62b2) (BuildId: 2db998bd67acbfb235c464c0275d4070061695fb)

  Thread T3 'SRT:RcvQ:w1' (tid=2535, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1036 (libtsan.so.2+0x3d179) (BuildId: 5e151fac359cd43a07192270fb85f74e380f2fc8)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xe6388) (BuildId: 2db998bd67acbfb235c464c0275d4070061695fb)
    #2 srt::sync::StartThread(std::thread&, void* (&)(void*), void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /srtcore/sync.cpp:91 (srt-xtransmit+0x342387) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #3 srt::CRcvQueue::init(int, unsigned long, int, int, srt::CChannel*, srt::sync::CTimer*) /srtcore/queue.cpp:1199 (srt-xtransmit+0x324c8c) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #4 srt::CUDTUnited::updateMux(srt::CUDTSocket*, srt::sockaddr_any const&, int const*) /srtcore/api.cpp:3163 (srt-xtransmit+0x251b7c) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #5 srt::CUDTUnited::bind(srt::CUDTSocket*, srt::sockaddr_any const&) /srtcore/api.cpp:935 (srt-xtransmit+0x24f393) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #6 srt::CUDT::bind(int, sockaddr const*, int) /srtcore/api.cpp:3525 (srt-xtransmit+0x25a777) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #7 srt_bind /srtcore/srt_c_api.cpp:112 (srt-xtransmit+0x33effd) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #8 operator() /mnt/d/Projects/srt/srt-xtransmit/xtransmit/srt_socket.cpp:433 (srt-xtransmit+0x10e8d4) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #9 xtransmit::socket::srt::handle_hosts() /mnt/d/Projects/srt/srt-xtransmit/xtransmit/srt_socket.cpp:493 (srt-xtransmit+0x10efe8) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #10 xtransmit::socket::srt::srt(UriParser const&) /mnt/d/Projects/srt/srt-xtransmit/xtransmit/srt_socket.cpp:67 (srt-xtransmit+0x10b254) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #11 void std::_Construct<xtransmit::socket::srt, UriParser const&>(xtransmit::socket::srt*, UriParser const&) <null> (srt-xtransmit+0xdb0f1) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #12 void std::allocator_traits<std::allocator<void> >::construct<xtransmit::socket::srt, UriParser const&>(std::allocator<void>&, xtransmit::socket::srt*, UriParser const&) <null> (srt-xtransmit+0xdaafb) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #13 std::_Sp_counted_ptr_inplace<xtransmit::socket::srt, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<UriParser const&>(std::allocator<void>, UriParser const&) <null> (srt-xtransmit+0xda2bd) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<xtransmit::socket::srt, std::allocator<void>, UriParser const&>(xtransmit::socket::srt*&, std::_Sp_alloc_shared_tag<std::allocator<void> >, UriParser const&) /usr/include/c++/12/bits/shared_ptr_base.h:971 (srt-xtransmit+0xd9832) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #15 std::__shared_ptr<xtransmit::socket::srt, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, UriParser const&>(std::_Sp_alloc_shared_tag<std::allocator<void> >, UriParser const&) /usr/include/c++/12/bits/shared_ptr_base.h:1712 (srt-xtransmit+0xd8be9) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #16 std::shared_ptr<xtransmit::socket::srt>::shared_ptr<std::allocator<void>, UriParser const&>(std::_Sp_alloc_shared_tag<std::allocator<void> >, UriParser const&) /usr/include/c++/12/bits/shared_ptr.h:464 (srt-xtransmit+0xd7e38) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #17 std::shared_ptr<std::enable_if<!std::is_array<xtransmit::socket::srt>::value, xtransmit::socket::srt>::type> std::make_shared<xtransmit::socket::srt, UriParser const&>(UriParser const&) <null> (srt-xtransmit+0xd71db) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #18 xtransmit::create_connection(std::vector<UriParser, std::allocator<UriParser> > const&, std::shared_ptr<xtransmit::socket::isocket>&) /mnt/d/Projects/srt/srt-xtransmit/xtransmit/misc.cpp:59 (srt-xtransmit+0xd55f9) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #19 xtransmit::common_run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, xtransmit::stats_config const&, bool, bool, std::atomic<bool> const&, std::function<void (std::shared_ptr<xtransmit::socket::isocket>, std::atomic<bool> const&)>&) /mnt/d/Projects/srt/srt-xtransmit/xtransmit/misc.cpp:132 (srt-xtransmit+0xd5c98) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #20 xtransmit::receive::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, xtransmit::receive::config const&, std::atomic<bool> const&) /mnt/d/Projects/srt/srt-xtransmit/xtransmit/receive.cpp:179 (srt-xtransmit+0xdd471) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #21 main /mnt/d/Projects/srt/srt-xtransmit/xtransmit/xtransmit-app.cpp:231 (srt-xtransmit+0x142732) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)

  Thread T1 (tid=2533, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1036 (libtsan.so.2+0x3d179) (BuildId: 5e151fac359cd43a07192270fb85f74e380f2fc8)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xe6388) (BuildId: 2db998bd67acbfb235c464c0275d4070061695fb)
    #2 srt::sync::StartThread(std::thread&, void* (&)(void*), void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /srtcore/sync.cpp:91 (srt-xtransmit+0x342387) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #3 srt::CUDTUnited::startup() /srtcore/api.cpp:266 (srt-xtransmit+0x24a654) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #4 srt::CUDT::startup() /srtcore/api.cpp:3360 (srt-xtransmit+0x259c27) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #5 srt_startup /srtcore/srt_c_api.cpp:32 (srt-xtransmit+0x33eb55) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #6 NetworkInit::NetworkInit() /mnt/d/Projects/srt/srt-xtransmit/xtransmit/xtransmit-app.cpp:48 (srt-xtransmit+0x15340c) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)
    #7 main /mnt/d/Projects/srt/srt-xtransmit/xtransmit/xtransmit-app.cpp:220 (srt-xtransmit+0x1426c2) (BuildId: 61e8e3fcf6378d7f53d4d27765628463dfc8bed6)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) /usr/include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749 in __gthread_mutex_lock
==================

@ethouris
Copy link
Collaborator

ethouris commented May 7, 2024

There was some PR where m_ConnectionLock was taken out for the call to closeInternal(), but I don't remember the details.

The inversion between m_LSLock and m_ConnectionLock is known, but it was later proven that these two things can never happen simultaneously - it's just detected by the thread sanitizer that these two cases may happen in theory. It would be nice to dissolve them, but no sensible way to fix it was found.

@tvaknin
Copy link
Author

tvaknin commented May 8, 2024

Could this problem be reproduced with having #2893 added?

I can try this next week

@tvaknin
Copy link
Author

tvaknin commented May 15, 2024

I tried the fix using the branch "dev-add-socket-busy-counter". I can reproduce the same problem. let me know if you need more info.

@maxsharabayko
Copy link
Collaborator

We need somehow to identify who is holding the mutex (see my comment above). Can you check other threads? There should be some trying to lock another mutex while m_GlobControlLock is locked.

@ethouris
Copy link
Collaborator

You may try the fix in #2032. I can't find any PR with that fix which does unlocking for the time of closeInternal().

@tvaknin
Copy link
Author

tvaknin commented May 16, 2024

We need somehow to identify who is holding the mutex (see my comment above). Can you check other threads? There should be some trying to lock another mutex while m_GlobControlLock is locked.

Attached full BT of all threads.

np_backtrace.log

@maxsharabayko
Copy link
Collaborator

maxsharabayko commented May 17, 2024

I see 7 threads calling srt_sendmsg and trying to lock m_GlobControlLock to lockate the socket, one GC thread, and one thread calling getsockstate. One srt_sendmsg thread (Thread 26) is releasing the mutex, so it does not look like a deadlock, just too many threads waiting for m_GlobControlLock.
It feels like the m_GlobControlLock should at least become a shared mutex to simplify access without modification.
See also #2393.

This issue is likely similar to #2252.

@ethouris
Copy link
Collaborator

Just took a quick look - m_GlobControlLock occurs in this log 10 times, there are many threads that are trying to lock this mutex and one - OutWorker_4 - is at this moment unlocking the mutex. Definitely not this mutex is a reason of a deadlock. Must be something else, likely one of the CVs.

@ethouris
Copy link
Collaborator

And the only other thing I can see where any of the SRT threads is standing is CSndUList::waitNonEmpty() call, which in general is waiting until there's anything scheduled for sending over the channel, for which this worker is running. Should that be a point of deadlock by some reason, we might add a possibility of waiting for a limited time and on timeout we can check if the condition is somehow satisfied to detect waiting failures (that waiting part is likely the unchanged legacy code and I have doubts as to whether it should be done this way because the first condition check should be done AFTER LOCKING, and only then if not satisfied the waiting function should be executed).

@maxsharabayko
Copy link
Collaborator

CSndUList::waitNonEmpty() gets interrupted either on a new socket added in the heap, or if the CSndQueue is being destroyed.
@ethouris you suggest it may be causing a deadlock in the situation when m_bClosing is being set to true right after the check

if (!self->m_bClosing)
{
    self->m_pSndUList->waitNonEmpty();
    IF_DEBUG_HIGHRATE(self->m_WorkerStats.lCondWait++);
}

while this notification is also being missed

srt::CSndQueue::~CSndQueue()
{
    m_bClosing = true;

    if (m_pTimer != NULL)
    {
        m_pTimer->interrupt();
    }

    // Unblock CSndQueue worker thread if it is waiting.
    m_pSndUList->signalInterrupt();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants