Skip to content
This repository has been archived by the owner on Jan 13, 2020. It is now read-only.

Feedback: Audio underrun on Start up to 25 Seconds Debian Buster #139

Open
linuxonlinehelp opened this issue Oct 10, 2019 · 23 comments
Open

Comments

@linuxonlinehelp
Copy link

today i rolled back my new pi3 b+ 2019 (same on pi4 4GB 2019) on Debian Stretch.
Cause I got permanent audio buffer lookup to 25 seconds on start openwebrx.py
openwebrx OK on:
libusb-1.0-0
gcc 6.3.0.18
cmake 3.7.2
kernel 4.14.98-v7+
python 2.7.13
used rtl-sdr Dongle silver Version3

On Buster always Audio Buffer underruns
python 2.7.16 are here Major Changes? or a libusb problem?
libusb 1.0.3
kernel 4.19
... who knows why?
csdr seems hanging on calc high and low cut 4000! waits here on 20~25seconds then run clean
on pi3 + pi4 (4GB)

System load was always ~25%
no Kernel Messages or Logs

@linuxonlinehelp
Copy link
Author

Hanging Area:
openwebrx-httpd:ws,0] command: SET mod=nfm low_cut=-4000 high_cut=4000 offset_freq=0
csdr old_fractional_decimator_ff: window = HAMMING
csdr old_fractional_decimator_ff: taps_length = 133
csdr bandpass_fir_fft_cc: (fft_size = 512) = (taps_length = 139) + (input_size = 374) - 1
(overlap_length = 138) = taps_length - 1
csdr shift_addition_cc: reinitialized to -0
csdr bandpass_fir_fft_cc: filter initialized, low_cut = -0.361631, high_cut = 0.361631
client 0x10c32c8: CS_THREAD_FINISHED, client_goto_source = 2, errno = 32[openwebrx-httpd:ws,0] command: SET low_cut=-4000 high_cut=4000 offset_freq=151719

@linuxonlinehelp
Copy link
Author

After test Raspian Buster with Stretch Firmware on Kernel 4.14-98 no changes!
Buster Default Kernel 4.19-75+4.19-79 fails with no logs !!
Found infos about Memory Leaks, may be possible a factor of bug
https://github.com/roger-/pyrtlsdr
Installed Packes List Buster + Stretch to verify
https://drive.google.com/open?id=1gXQCKo8ImFb7i89sbFfzamYV5VKIyLJR

@manofftoday
Copy link

Same problem here with RPI3 and Raspbian Buster.

@nackstein
Copy link

nackstein commented Nov 21, 2019

I think a problem is in csdr bandpass_fir_fft_cc function thtat perform benchmark instead of estimate. the benchmark is really slow and on start seems to crash something (I suppose nmux) and this trigger a restart of the whole chain.
I fixed with this patch and it works for me: see http://sdr.undo.it

--- ../csdr.old/csdr.c 2017-09-25 21:45:33.018152254 +0000
+++ csdr.c 2019-11-21 17:09:11.068684954 +0000
@@ -1849,14 +1849,14 @@
//make FFT plans for continously processing the input
complexf* input = fft_malloc(fft_sizesizeof(complexf));
complexf
input_fourier = fft_malloc(fft_size*sizeof(complexf));

  •    FFT_PLAN_T* plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 1); //forward, do benchmark
    
  •    FFT_PLAN_T* plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 0); //forward, do benchmark
    
       complexf* output_fourier = fft_malloc(fft_size*sizeof(complexf));
       complexf* output_1 = fft_malloc(fft_size*sizeof(complexf));
       complexf* output_2 = fft_malloc(fft_size*sizeof(complexf));
       //we create 2x output buffers so that one will preserve the previous overlap:
    
  •    FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 1); //inverse, do benchmark
    
  •    FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 1);
    
  •    FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 0); //inverse, do benchmark
    
  •    FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 0);
       //we initialize this buffer to 0 as it will be taken as the overlap source for the first time:
       for(int i=0;i<fft_size;i++) iof(plan_inverse_2->output,i)=qof(plan_inverse_2->output,i)=0;
    

@linuxonlinehelp
Copy link
Author

Hi thanks will try that, i had same issues on:
Odroid N2 4GB Hexa Core Ubuntu 18.04/Buster-Armbian
Pi4 4GB Ram Buster-Raspian

@linuxonlinehelp
Copy link
Author

Had same issues on Orange Pi H3 512MB

  • installed Armbian Stretch,
    Linux orangepipc 5.3.9-sunxi "Illegal instruction (core dumped)" problem #19.11.3 SMP Mon Nov 18 18:49:43 CET 2019 armv7l GNU/Linux
    i did:
  • fix csdr.c, recompile with make && make install
    openwebrx.py settings:
  • fft_fps=3
  • fft_voverlap_factor=0.1
  • mathbox_waterfall_history_length = 5
  • samp_rate = 1200000 # with"1200000" !! then the audio underrun disappeared !! NOT 120000!!
  • center_freq = 438950000 # 70cm Band
    NOW CPU at 22% with LAN works like charme..

@oe2lsp
Copy link

oe2lsp commented Dec 1, 2019

I cannot get the patch to compile, in the line
complexf* input = fft_malloc(fft_sizesizeof(complexf));
seemes something strange with fft_size and sizeof... what do i miss?

@nackstein
Copy link

I cannot get the patch to compile, in the line
complexf* input = fft_malloc(fft_sizesizeof(complexf));
seemes something strange with fft_size and sizeof... what do i miss?

unfortunately I wasn't able to past correctly the patch, I don't know ML very well and the patch got all splitted up in the output above. try this patch.
anyway it's very simple just change the last 1 to 0 in the makt_fft_c2c call. this change the behavior
of the function that perform an estimate instead of a benchmark

--- ../csdr.old/csdr.c 2017-09-25 21:45:33.018152254 +0000
+++ csdr.c 2019-11-21 17:09:11.068684954 +0000
@@ -1849,14 +1849,14 @@
//make FFT plans for continously processing the input
complexf* input = fft_malloc(fft_sizesizeof(complexf));
complexf
input_fourier = fft_malloc(fft_sizesizeof(complexf));
- FFT_PLAN_T
plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 1); //forward, do benchmark
+ FFT_PLAN_T* plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 0); //forward, do benchmark

     complexf* output_fourier = fft_malloc(fft_size*sizeof(complexf));
     complexf* output_1 = fft_malloc(fft_size*sizeof(complexf));
     complexf* output_2 = fft_malloc(fft_size*sizeof(complexf));
     //we create 2x output buffers so that one will preserve the previous overlap:

- FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 1); //inverse, do benchmark
- FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 1);
+ FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 0); //inverse, do benchmark
+ FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 0);
//we initialize this buffer to 0 as it will be taken as the overlap source for the first time:
for(int i=0;i<fft_size;i++) iof(plan_inverse_2->output,i)=qof(plan_inverse_2->output,i)=0;

@jketterl
Copy link

jketterl commented Dec 4, 2019

I just tested downgrading libfftw3* from 3.3.8 (which is the version included in buster) to 3.3.5 (which is the version in stretch), which seems to restore the original, quick startup. Anybody know what's going on between these two versions?

I do understand that switching to FFTW_ESTIMATE also does the trick, but is there any insight why? The code has been doeing FFTW_MEASURE for ages, has the behaviour been changed?

There's no other versions on the raspbian repository to try. Here's where I got the packages for the downgrade: http://raspbian.raspberrypi.org/raspbian/pool/main/f/fftw3/

@jketterl
Copy link

jketterl commented Dec 4, 2019

I found this part of the documentation: http://www.fftw.org/fftw3_doc/Cycle-Counters.html

and i found this in the changelog for version 3.3.6p2-1 of the debian package:

  * ARM targets have --with-slow-timer enabled to avoid difficulties with
    erratic timers for planning and self-optimisation

if i puzzle this together correcly, the .deb for 3.3.5 came without any cycle counter, and as such fell back to FFTW_ESTIMATE. from version 3.3.6p2-1 forward, a "slow" cycle timer, that is implemented in software, has been enabled, which probably allows the FFTW_MEASURE to work for the first time, albeit "slow".

Summing up: that means the patch suggested by @nackstein should restore the known behaviour for arm processors. For a useful patch, the fix should probably be wrapped in precompiler statements as to only be applied on arm processors.

I am currently recompiling fftw3 3.3.8 without the flag to verify.

I have also seen that there is hardware cycle support for armv7a processors. not sure if raspi processors fall into that, but it might be worth a try.

@jketterl
Copy link

jketterl commented Dec 5, 2019

confirmed: removing --with-slow-timer from the build restores quick startup, too. I will attempt to get a proper fix for this.

@jketterl
Copy link

jketterl commented Dec 5, 2019

i have opened up a pull request that should resolve this for raspberry users: ha7ilm/csdr#51

it may be a little over the top since it applies to all arm processors, i am definitely open for ideas on how to detect the actual scenario. please leave replies about that on the PR.

@linuxonlinehelp
Copy link
Author

@jketterl jketterl
i did test the changes on odroid n2 but was NOT able to compile with make cause some of the NEON Parameters dont work, so i comment / removed all Neon Parameters at the Makefile to enforce gcc "autodetect" ARM Parameters which does let make start, but run into error. For me the @nackstein nackstein Workaround works only with "empty" Neon Paramteters and Disabled Performance Check cause odroid N2 uses 2 different CPUs A73 A53 !! together as HEX Core System. If i get free time i will try a setup on RASPI 3 with BUSTER where the Start lookups up to 30 seconds.

@jketterl
Copy link

jketterl commented Dec 6, 2019

yes, i believe the cpu detection / compiler optimisation in the makefile is broken in more than one way (i.e. it detects raspberry pi by looking for "BCM2708" in /proc/cpuinfo - i have tested a bunch of my raspberries for this, and it only applies to a single 1B+). I am however not knowlegdeable enough in the field of CPU hardware to fix that.

If you did have a way to compile this before my changes, it should still work that way now. Just make sure that you keep the newly added -DCSDR_DISABLE_FFTW_MEASURE somewhere in there.

@linuxonlinehelp
Copy link
Author

i wrote to the Debian Maintainer of fftw3 to check this behavior cause newbies cant fix this and it makes openwebrx unuseable on ALL ARM OS 2019 setups , BUT WE NEED THIS ! cause its the one and only opensource websdr-Server Software of Andras

@jketterl
Copy link

jketterl commented Dec 6, 2019

well, i didn't inquire, but i'm assuming there is a story behind why they enabled it in the first place. Unfortunately, recompiling the packages is quite the process.

@smoe
Copy link

smoe commented Dec 6, 2019

Hello. I am the one responsible for the 3.3.6 upload to Debian, which was mostly motivated to help fftw on ARM, really. Thank you for all your trouble to identify the culprit. Please allow me some extra time to find some external input on this issue. If there is no technical agreement/solution then we should possibly have two packages, both compiled with different parameters.

@smoe
Copy link

smoe commented Dec 6, 2019

I don't have the external input, yet, but memory kind of kicks back in. Our work on the fftw update was motivated to get best-possible fftw performance in a high-performance setup with RPis (for Einstein@Home this was). No idea if this holds for the RPi4, but with previous models the RPi had problems to give exact timings, so you could not tell what route in fftw was the best. Hence the "slow timer" setting. Once a so called "wisdom" file was created, which takes a few hours to create, the planning is known for subsequent program invocations. Even in case that the RPi4 no longer needs that slow timer, there is yet only one package for all RPi versions and to have the slow timer parameter set for that one package kind of seems right to me.

The FFTW_ESTIMATE basically says that you don't care too much about using the best-possible way to compute with FFTW. So, the planner within the FFTW has less to think about. If that is technically sufficient for your application, then I think I would just go and set that environment variable at startup.

Did anybody of yours look into wisdom files? This should dramatically improve the performance of the FFTW. No idea if this also reduces noise levels for you - would actually be interesting to learn about. You need one wisdom file per platform. This should then grant immediate startup times, too. However, from what I understood, it is a timeout/crash somewhere else that is the main cause for the delay. Maybe you want to have a look into that, too.

A SDR is on my Xmas shopping list. Anyway, let's wait for what the ARM+FFTW experts say.

@N30dG
Copy link

N30dG commented Dec 7, 2019

We had a similar discussion some while ago:
https://salsa.debian.org/science-team/fftw3/commit/20ebb730db9abeaf74145d6beb8035800fb2c05f

There is no cycle counter available in user space on arm/arm64, therefore we have to rely on with-slow-timer to get proper plan generation on arm. Without the with-slow-timer flag, fftw has no way of benchmarking and therefore always felt back to FFTW_ESTIMATE, no matter what you specified.

From the FFTW-Manual http://www.fftw.org/fftw3_doc/Cycle-Counters.html:
"If you are not supported, FFTW will by default fall back on its estimator (effectively using FFTW_ESTIMATE for all plans). "

The simplest fix for that would be to use FFTW_ESTIMATE on all arm devices. It doesn't matter if it's a PI4, PI3, Odroid N2 or what ever. It's the same on all arm devices. That should restore the behavior of your app before the fftw 3.3.6p2-1 update.

A better solution would be to once generate a wisdom, export it and load it on next startup.

@smoe
Copy link

smoe commented Dec 8, 2019

@N30dG, thank you for helping out and for the link to the fftw3 repository on salsa (which is where the packaging work is orchestrated) and the associated discussion.

@ALL, the generation of the wisdom file needs insights on how exactly the fftw3 library is invoked. What would be pretty cool is to have the exact command line for fftw-wisdom (or its variants) given in the documentation of openwebrx and also openwebrx shoudl possibly collect working wisdom files for different platforms from the community.

As a bit of a sidenote I just saw that there is a Debian package for rtl-sdl but none for libcsdr. To compensate for your extra hassle with the ARM platform I could help with a Debian package for that library. Tell me.

@jketterl
Copy link

jketterl commented Dec 8, 2019

thank you @smoe and @N30dG for providing some insight on what exactly is going on, and why the slow timer was enabled. I really don't mind the slow timer per se, given that the old behaviour is still available by using FFTW_ESTIMATE. The only thing I'm having trouble with is detecting the slow timer, since I'd really like to keep FFTW_MEASURE in place for those scenarios where it works.

As for wisdom: I have had a quick look at them, but I have not yet fully understood the code well enough to patch it in. I'm concerned with the dynamic nature of this application, I'm not sure if the actual fft parameters will be repeating. If they are not, wisdom files would probably not help much.

Also, I wanted to clarify that openwebrx is only indirectly affected since it calls the csdr command-line tools; this issue would probably be better placed in the csdr project.

I have also done some work on a debian package over on my fork: https://github.com/jketterl/csdr/tree/debian - I'm currently trying to package all of the openwebrx related tools to facility simpler installation, even though I do not intend to publish them into the debian repositories, not for now. Many parts need polishing.

@smoe
Copy link

smoe commented Dec 8, 2019

Nice to hear you already started with the packaging. I happily sponsor an upload for you.

For all ARM platforms you can just use FFTW_ESTIMATE since prior to 3.3.6 this is all you had, anyway. And you seemed (seem for those falling back to <=3.3.5) happy with it. As I said, I would really like to learn if noise levels are different - if not then it sounds much like a non-issue except for some CPU time wasted. To learn about the platform you are running on it seems completely fine to just invoke "arch" or "uname -m", which should be on all Linux platforms.

You are right wrt to an unclear size argument of the FFTW invocation. This likely depends on the bandwith of the individual SDR, right? Just guessing. @N30dG kindly pointed me in a PM to https://github.com/simonyiszk/csdr/blob/master/fft_fftw.c where you may want to patch in the dump of all new sizes requested to stderr. Then the users know for what sizes to prepare the wisdom and with a bit of luck there is not too much variation between the devices.

@N30dG
Copy link

N30dG commented Dec 9, 2019

There is no need to know the size of a transformation for wisdom generation. You only have to know the size when you use the fftw-wisdom tool. For most applications you shouldn't use this tool anyway.
Simply export the wisdom that fftw generates, when generating a plan for the transformation.

For example, the first function of https://github.com/simonyiszk/csdr/blob/master/fft_fftw.c should look something like this:
...
fftw_import_wisdom_from_filename("/etc/fftw/wisdom_c2c.dat");
plan->plan = fftwf_plan_dft_1d( ... );
fftw_export_wisdom_to_filename("/etc/fftw/wisdom_c2c.dat");
...

FFTW wisdom's can contain multiple transformation sizes in one file. If the actual size isn't contained in the wisdom you have imported, fftw adds the informations for the new size to the wisdom. This way your wisdom gets "better" over time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants