Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoapySDR python stuck in loop after stopping sdrplay service #364

Open
Ifilehk opened this issue Jun 24, 2022 · 8 comments
Open

SoapySDR python stuck in loop after stopping sdrplay service #364

Ifilehk opened this issue Jun 24, 2022 · 8 comments
Assignees

Comments

@Ifilehk
Copy link

Ifilehk commented Jun 24, 2022

The following situation is causing a hang with python running at 100% CPU, so most probably SoapySDR with SoapySDRPlay3 stuck in loop. Thank you for checking and resolving.
Python script:

import SoapySDR, time
while 1:
devices = SoapySDR.Device_enumerate()
time.sleep(1)

in a console:
service sdrplay start --> reaction OK. SDRplay device found
service sdrplay stop --> reaction BAD. The script crashes.

gdb trace:
__futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=12262, futex_word=0x7ffff63ff910) at ./nptl/futex-internal.c:57
57 ./nptl/futex-internal.c: No such file or directory.
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=12262, futex_word=0x7ffff63ff910) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=12262, futex_word=0x7ffff63ff910) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff63ff910, expected=12262, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128)
at ./nptl/futex-internal.c:139
#3 0x00007ffff7ccc6a4 in __pthread_clockjoin_ex (threadid=140737324774976, thread_return=0x0, clockid=0, abstime=0x0, block=) at ./nptl/pthread_join_common.c:105
#4 0x00007ffff68bd337 in std::thread::join() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff7ccff68 in __pthread_once_slow (once_control=0x555555cf7138, init_routine=0x7ffff68bbdc0 <__once_proxy>) at ./nptl/pthread_once.c:116
#6 0x00007ffff70129da in std::__future_base::_Async_state_commonV2::_M_complete_async() () from /usr/local/lib/libSoapySDR.so.0.8-2
#7 0x00007ffff7010411 in SoapySDR::Device::enumerate(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > const&) () from /usr/local/lib/libSoapySDR.so.0.8-2
#8 0x00007ffff714931a in _wrap_Device_enumerate () from /usr/local/lib/python3.10/dist-packages/_SoapySDR.so
#9 0x00005555556af138 in cfunction_call (func=<built-in method Device_enumerate of module object at remote 0x7ffff73ada30>, args=, kwargs=) at ../Objects/methodobject.c:552
#10 0x00005555556bdc3b in _PyObject_Call (kwargs=, args=(), callable=<built-in method Device_enumerate of module object at remote 0x7ffff73ada30>, tstate=0x555555b5a420)
at ../Objects/call.c:305

@fventuri
Copy link

@Ifilehk - thanks for reporting the issue.
In order to rule out (or not) that it is a problem caused by the Python interpreter or the Python bindings, I wrote a very similar program in C that calls the function SoapySDRDevice_enumerate() directly.

When I replicate your experiment here, I too notice that my C program hangs, and when I inspect it with gdb I can see that it is probably waiting for a mutex in the SDRplay API function sdrplay_api_LockDeviceApi():

Thread 1 (Thread 0x7f0ead08fb40 (LWP 7145)):
#0  0x00007f0eace910d0 in ___pthread_mutex_trylock (mutex=0x7f0ead232000) at pthread_mutex_trylock.c:409
#1  0x00007f0eac605918 in sdrplay_MutexLock(void*, unsigned long) () from /usr/local/lib64/libsdrplay_api.so.3.07
#2  0x00007f0eac605a08 in sdrplay_api_LockDeviceApi () from /usr/local/lib64/libsdrplay_api.so.3.07
#3  0x00007f0ead05ab9c in findSDRPlay (args=std::map with 1 element = {...}) at /home/franco/SDR/SoapySDRPlay3/Registration.cpp:46
...

Since the shared library libsdrplay_api.so.3.07 provided by SDRplay is only available as a compiled binary library, I can't really tell you much more about this issue.
Perhaps you could create a technical support case with them; I am attaching the source code of my simple test so they should be able to reproduce it and help you resolve it.

Franco

Device_enumerate_test.zip

@fventuri fventuri self-assigned this Jun 25, 2022
@Ifilehk
Copy link
Author

Ifilehk commented Jun 25, 2022

Hello Franco.

Thank you for your fast reply.

I am not that specialist in C/C++ but find it strange that there is no mechanism to avoid a deadlock. Imagine libsdrplay_api.so.3.07 would just crash, in that case it is impossible for the library to release the lock. So not sure that the external library should absolutely implement a mutex release because it would not cover the crash case.

Tarik

@fventuri
Copy link

Tarik, I am not sure I understand your comment.

As you can see from the gdb stack trace the mutex I am referring to is inside the SDRplay proprietary library libsdrplay_api.so.3.07, and it is not something we can control from the client code (other than by not calling sdrplay_api_LockDeviceApi(), but the API specification indicates it should be called before invoking sdrplay_api_GetDevices(), which is the function that device enumeration uses).

Therefore in case of an abnormal termination of the client application (i.e. a crash; a library in itself doesn't crash but can definitely cause the client application to crash) there wouldn't be any mutex any more since the whole process is terminated.

Franco

@Ifilehk
Copy link
Author

Ifilehk commented Jun 26, 2022

Maybe we are not tuned on the right frequency yet :-)
Will try it another way because questioning my self further regarding the process flow of soapySDR when calling Device_enumerate().

If the service sdrplay is stopped. The behavior of SoapySDR when calling Device_enumerate() it as expected. It throws the errors:
[ERROR] sdrplay_api_Open() Error: sdrplay_api_Fail
[ERROR] Please check the sdrplay_api service to make sure it is up. If it is up, please restart it.
[ERROR] SoapySDR::Device::enumerate(sdrplay) sdrplay_api_Open() failed

I start the sdrplay service. Everything fine. Device_enumerate() detects the SDRPLAY device.

I wonder now why, when I stop the sdrplay service, SoapySDR when calling Device_enumerate() again is not able to throw the errors:
[ERROR] sdrplay_api_Open() Error: sdrplay_api_Fail
[ERROR] Please check the sdrplay_api service to make sure it is up. If it is up, please restart it.
[ERROR] SoapySDR::Device::enumerate(sdrplay) sdrplay_api_Open() failed

It seems for me to be in the same situation as at the start of this test that works as expected. We should have an "sdrplay_api_Open() Error" and we should not be able to call sdrplay_api_LockDeviceApi(). What is wrong my seeing ?

Tarik

@fventuri
Copy link

Tarik,
it's because sdrplay_api_Open() (the SDRplay API function that opens and initializes their API) is invoked only once, the first time findSDRPlay is called (see here: https://github.com/pothosware/SoapySDRPlay3/blob/master/Registration.cpp#L45).

If you want to reinitialize the SDRplay API every single time, you can use a Python script like this:

import multiprocessing
import time

def enumerate():
  import SoapySDR
  devices = SoapySDR.Device_enumerate()

while True:
  p = multiprocessing.Process(target=enumerate)
  p.start()
  p.join()
  time.sleep(1)

Franco

@Ifilehk
Copy link
Author

Ifilehk commented Jun 27, 2022

Franco,
Thank you for your clarifications and proposed solution. Was using subprocess with popen and SoapySDRUtil without problem what is equivalent I suppose to the multiprocess approasch and makes me understand the mutex problem.

  • Any good reasons why SoapySDR have to invoke sdrplay_api_Open() once? and not at every Device_enumerate ?
  • Would it make sense for some usecases to be able to parameter a force a sdrplay_api_Close() sdrplay_api_Open() to get rid of this deadlock ?

Thank you for your support so far !

Tarik

@Ifilehk
Copy link
Author

Ifilehk commented Jun 27, 2022

Franco,
As addendum solution to your workaround to this deadlock problem without using multiprocessing, just discovered that using the unloadModules() and loadModules() puts SoapySDR back into a fresh state thus not deadlocking when invoking again Device_enumerate()

@fventuri
Copy link

Tarik,
good find! Your solution of using SoapySDR.unloadModules() and SoapySDR.loadModules() is much better and more elegant than mine!

As per your question about the reason of the way I am invoking sdrplay_api_Open() in the SoapySDRPlay3 module; when I initially wrote this module I followed the instructions from the SDRplay API Specification Guide (https://www.sdrplay.com/docs/SDRplay_API_Specification_v3.07.pdf), and the example in that guide. From that guide I thought that the client application (in this specific case the client using the SoapySDR Python bindings) is supposed to call sdrplay_api_Open() once before any other SDRplay API function is invoked.
The same guide also on page 22 seems to indicate that the function sdrplay_api_LockDeviceApi() would return the error type sdrplay_api_ServiceNotResponding when the sdrplay_api service is not available (instead of blocking).

One of the issues of writing a general purpose module like SoapySDRPlay3 is that it is almost impossible to foresee all the possible ways it could be used since at the end of the day it is just a shared library/module; for instance one could imagine a different use case where the client application is successfully attached to an RSP (possibly streaming from it), and while it is attached it calls the findSDRPlay() function (for instance to show a list of available RSPs), and in this scenario the sequence sdrplay_api_Close() + sdrplay_api_Open() may disrupt the ongoing streaming from the RSP, even without unplugging the device or stopping the sdrplay_api service.

In these cases my personal preference is to what I call the 'principle of least surprise', where the user running the application wouldn't be 'surprised' by with streaming from the RSP going down while they are trying to list the available devices. Also I find that making function calls explicit in the script, like unloadModules() + loadModules() helps significantly to understand the logic of the program (for instance the fact that in your case you want to handle the possibility of someone disconnecting the RSP).

All in all I think your approach of calling SoapySDR.unloadModules() followed by SoapySDR.loadModules() is a very reasonable and sensible one, and kudos again for finding that solution!

Franco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants