New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] A thread is stuck waiting to close an SRT socket due to an internal SRT mutex lock. #2944
Comments
From the provided stack traces it is not clear who holds the
But from the state of the GC thread that above described case is not the one happening here.
|
Could this problem be reproduced with having #2893 added? |
Thread sanitizer shows a potential deadlock around those functions:
Full Threat ReportClick to expand
|
There was some PR where m_ConnectionLock was taken out for the call to closeInternal(), but I don't remember the details. The inversion between m_LSLock and m_ConnectionLock is known, but it was later proven that these two things can never happen simultaneously - it's just detected by the thread sanitizer that these two cases may happen in theory. It would be nice to dissolve them, but no sensible way to fix it was found. |
I can try this next week |
I tried the fix using the branch "dev-add-socket-busy-counter". I can reproduce the same problem. let me know if you need more info. |
We need somehow to identify who is holding the mutex (see my comment above). Can you check other threads? There should be some trying to lock another mutex while |
You may try the fix in #2032. I can't find any PR with that fix which does unlocking for the time of closeInternal(). |
Attached full BT of all threads. |
I see 7 threads calling This issue is likely similar to #2252. |
Just took a quick look - |
And the only other thing I can see where any of the SRT threads is standing is |
if (!self->m_bClosing)
{
self->m_pSndUList->waitNonEmpty();
IF_DEBUG_HIGHRATE(self->m_WorkerStats.lCondWait++);
} while this notification is also being missed srt::CSndQueue::~CSndQueue()
{
m_bClosing = true;
if (m_pTimer != NULL)
{
m_pTimer->interrupt();
}
// Unblock CSndQueue worker thread if it is waiting.
m_pSndUList->signalInterrupt(); |
Hi everyone,
Potential issue: A thread is stuck waiting to close an SRT socket due to an internal SRT mutex lock.
The Backtrace of the thread being blocked:
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f6ad3751411 in __GI___pthread_mutex_lock (mutex=0x7f6ad4c1e480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f6ad4b53b33 in srt::CUDTUnited::locateSocket(int, srt::CUDTUnited::ErrorHandling) () from ../libs/libsrt.so.1.5 #3 0x00007f6ad4b5bc58 in srt::CUDTUnited::close(int) () from ../libs/libsrt.so.1.5 #4 0x00007f6ad4b5bcec in srt::CUDT::close(int) () from ../libs/libsrt.so.1.5 #5 0x00000000004c1e78 in SrtConnectionManager::CheckForConnection (this=0x10ecea0 <SrtConnectionManager::GetInstance()::inst>) at muxer/SrtConnectionManager.cpp:244
I have a few more threads ~6 that are also stuck on this Mutex. e.g.
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f954b9e2b33 in srt::CUDTUnited::locateSocket(int, srt::CUDTUnited::ErrorHandling) () from../libs/libsrt.so.1.5 #3 0x00007f954b9e623c in srt::CUDT::sendmsg2(int, char const*, int, SRT_MsgCtrl_&) () from ../libs/libsrt.so.1.5 #4 0x00007f954b9e629f in srt::CUDT::send(int, char const*, int, int) () from ../libs/libsrt.so.1.5 #5 0x00000000004dad3d in OutSrtSocket::Send (successCnt=0x2f8d484, this=0x7f950194eeb0) at Revioly/OutSrtSocket.cpp:32
I also found that the SRT thread "SRT:GC" holds it too:
41 Thread 0x7f94feffd700 (LWP 32188) "SRT:GC" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f954b9ed00e in srt::CUDTUnited::checkBrokenSockets() () from ../libs/libsrt.so.1.5 #3 0x00007f954b9ed6a8 in srt::CUDTUnited::garbageCollect(void*) () from ../libs/libsrt.so.1.5 #4 0x00007f954a5ddefc in start_thread (arg=<optimized out>) at pthread_create.c:479 #5 0x00007f9549ec122f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
SRT internal sockets thread example:
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f954a5e0411 in __GI___pthread_mutex_lock (mutex=0x7f954baad480 <srt::CUDT::uglobal()::instance+96>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f954b9ebfb8 in srt::CUDTUnited::newConnection(int, srt::sockaddr_any const&, srt::CPacket const&, srt::CHandShake&, int&, srt::CUDT*&) () from ../libs/libsrt.so.1.5 #3 0x00007f954ba1948d in srt::CUDT::processConnectRequest(srt::sockaddr_any const&, srt::CPacket&) () from ../libs/libsrt.so.1.5 #4 0x00007f954ba60156 in srt::CRcvQueue::worker_ProcessConnectionRequest(srt::CUnit*, srt::sockaddr_any const&) () from ../libs/libsrt.so.1.5 #5 0x00007f954ba60ec4 in srt::CRcvQueue::worker(void*) () from ../libs/libsrt.so.1.5 #6 0x00007f954a5ddefc in start_thread (arg=<optimized out>) at pthread_create.c:479 #7 0x00007f9549ec122f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
To Reproduce
Steps to reproduce the behavior:
No problem in SRT versions 1.5.1 or 1.5.2 (I didn't check 1.5.2) But, I found the issue in version 1.5.3.
My setup has 70 SRT sockets (Senders - listeners), setting the SRT latency to 6000ms.
The trouble starts in the Sender-Listener device right after the downstream device (Receiver-Caller) restarts or loses connection.
Desktop
The text was updated successfully, but these errors were encountered: