Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible FFT configuration #206

Open
dl9rdz opened this issue Jan 4, 2021 · 5 comments
Open

More flexible FFT configuration #206

dl9rdz opened this issue Jan 4, 2021 · 5 comments
Labels

Comments

@dl9rdz
Copy link

dl9rdz commented Jan 4, 2021

Feature description
More flexible FFT configuration. Currently, you can specify an "fft_voverlap_factor" for FFT.

If it is 0, a single FFT per fft display line is calculated. For, e.g, an 10 MSps Airspy, FFT size 4096, and 9 fps, this is a single 4096 point FFT per 1.111.111 input samples. Low CPU cost, but very ugly.

If it is >0, multiple FFTs are used per line. The calculation is done such that all input samples, some even multiple times depending on the overlap, are used, which means at least 272 FFT caculation per fft display line in above configuration (for a value very close to 0) are done. Looks good, but this is 40% CPU just for the FFT with a single user on a i5-8365U, and it completely breaks a smaller system, e.g., Odroid XU4 (which otherwise might just handle the load).

It would be nice to have a more flexible way of configuring the FFT averaging in between those two extremes, e.g., a configuration averaging 20 FFTs for a single fft display line.

Is there any reason to use this "fft_voverlap_factor" in the config and calculate the "fft_averages", instead of directly specifying the fft_averages in the config?

(An (ugly?) hack that I have been using for years on HA7ILM's original version is simply change "if cfg.fft_voverlap_factor>0 else 0" to "!=0" in the code, so you can put a negative overlap in the config to get the desired results. But probably directly specifying the number of averages is the cleaner version? On the other hand, it might be somewhat intuitive as well, using "-10" as "use one out of 10 (fft-size) blocks of input samples for fft")

Target audience
Any server operating, in particular with low-end computers and high-samplerate SDRs.

@dl9rdz dl9rdz added the feature feature requests label Jan 4, 2021
@jketterl
Copy link
Owner

jketterl commented Jan 4, 2021

I have no idea what the reasoning was behind that setup. I have shifted this code around a few times now, but I haven't investigated it in-depth or researched any other solutions for now. As far as I'm concerned, it does its job.

From what I understand, the FFT averaging has been a contribution to the original project, see ha7ilm#49 and ha7ilm/csdr#19.

Running a 10MS/s SDR on any of the ARM SBC so far is just out of its league, and the fixed FFT of OpenWebRX won't give you a good resolution even if you have enough CPU power. There is an existing discussion about that: #106

There is multiple ideas being kicked around at the moment as to what can be done for the performance. I am currently working on a new demodulation pipeline implementation that doesn't use the shell pipes and should gain both performance and make it easier to work with the pipelines.

Another idea is to switch from distributing IQ data to an FFT/iFFT approach, which of course would be a huge change. One byproduct of that change would be that FFT data would be present in abundance, so a source for waterfall data (even an individual source per connection) should be fairly simple to implement then.

@jketterl jketterl added idea and removed feature feature requests labels Jan 4, 2021
@dl9rdz
Copy link
Author

dl9rdz commented Jan 4, 2021

The averaging itself is fine, its just the configuration that has its limitations. For a RTL-SDR on a powerful PC it will not matter, for other use cases such a minor change will make a difference...

Regarding ARM SBC, with some twitching, the XU4 will run just fine @10MSps (with 1-2 user, more is in fact out of its league)...

I like the idea of a better demodulation pipeline without the shell pipes. These are a big performance bottleneck on the XU4. What would be nice to have is the possibility to configure the chain to do the downsampling in two steps instead of one.

@jketterl
Copy link
Owner

jketterl commented Jan 4, 2021

There is indeed another idea, mostly referred to as "sub-bands"... Basically, the idea is to run a wideband SDR and extract "slices" for the clients, So, as an example, you could run an SDRPlay with 10MS/s on shortwave and extract the 80m, 40m and 30m band at the same time. Not sure if that's what you're thinking of.

@dl9rdz
Copy link
Author

dl9rdz commented Jan 5, 2021

That is conceptually pretty similar, but not the same.

I was thinking of the single client decoder chain. In the standard chain, the decimation is done by a factor of 907 (10MSps->11.025kSps) with fir_decimate_cc 907 0.000165380374862183 HAMMING, that is a 24187 tap FIR filter, or about 27 c*f multiplications per input sample (@10MSps).

If you split that in two decimation steps, you can have a shorter first stage-filter, in particular because you don't care about aliasing in the parts that are filtered out in the second stage. Lets just for example assume a first stage /10 decimation, for that you only need a decimation filter that passes the final 11.025k band and suppresses anything that might cause aliasing, e.g. everything above (1M-11.025k), I didn't do the math right now, but such a filter should be feasible with let's say less than 100 taps, thats 10 multiplications per input sample. Plus the second stage, thats maybe roughly another 3. So in total maybe half the CPU power.

Instead of my explanation, you find better ones online, like this one:
https://de.mathworks.com/help/dsp/ug/multistage-design-of-decimators-interpolators.html

You can make a science out of how to optimally split the decimation into two parts, carefully analysing also the impact of quantification error. I didn't do that, just rough guessing to have it "works for me" on an XU4 running @10Msps.
It should also be of benefit on low-end ARM SBC with RTL-SDR and several users... So maybe an option to consider when redesigning the chain...

@jketterl
Copy link
Owner

jketterl commented Jan 6, 2021

Just to clarify... The current aim is not to redesign the chain itself, i.e. the algorithms involved will stay the same for now. The aim is to eliminate the shell pipes, and also find a better API that makes it easier to handle, modify and implement chains in the future.

Either way, we are drifting off topic here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants