Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% cpu usage with I2C audio hats #491

Open
craigerl opened this issue Oct 17, 2023 · 18 comments
Open

100% cpu usage with I2C audio hats #491

craigerl opened this issue Oct 17, 2023 · 18 comments

Comments

@craigerl
Copy link

craigerl commented Oct 17, 2023

I'm observing direwolf with 100% cpu usage when using I2C audio boards, specifically

  • FE-Pi audio hat
  • Audio Injector Z hat

observed on Pi 3A+, Pi Zero2W, Raspberry Pi OS bullseye
not observed on the original Pi Zero.
not observed with USB audio dongle
arecord and mpg123 do not generate significant cpu usage with the above i2c boards
observed on direwolf 1.6 and 1.7-dev-A

-craig
KM6LYW

@dranch
Copy link
Collaborator

dranch commented Oct 17, 2023

Are you using 32bit or 64bit Raspberry Pi OS? Can you please post your Direwolf configuration? Can you also please try the newest version of Direwolf from the Dev branch as Dev-A is rather old now

@craigerl
Copy link
Author

Hey David,

32bit Bullseye
Not using any configuration file
Observed with dev branch direwolf as of this posting

thanks,
-craig

@craigerl
Copy link
Author

Using "-n 2" on the direwolf command line with no other options and no config file fixes the problem. cpu usage is normal (17%)

related unresolved thread, which also uses an i2c sound device ,

#131

@dranch
Copy link
Collaborator

dranch commented Oct 17, 2023

Generally speaking, Direwolf really wants a configuration file to read so I recommend you enable that. Without specifying the "-n 2" item, can you copy/paste in what you're seeing when you start Direwolf from the command line and we can see what it's assuming?

@craigerl
Copy link
Author

replicated on Buster, direwolf 1.6
using "-n 2" or "ACHANNELS 2" is a fix in this case as well

@craigerl
Copy link
Author

craigerl commented Oct 17, 2023

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
  580 pi        20   0  139440  10744   7812 S 100.0   2.2   3:38.49 direwolf 

MYCALL KM6LYW-2
IGSERVER noam.aprs2.net
IGLOGIN KM6LYW-2 000000
PBEACON sendto=IG compress=1 delay=00:10 every=30:00 symbol="igate" overlay=R lat=38.9919N long=120.9956W comment=" DigiPi http://craiger.org/digipi/"
DCD GPIO 16
DWAIT 0
TXDELAY 30
TXTAIL 10
AGWPORT 8000
KISSPORT 8001

export ALSA_CARD=0
direwolf -d t -p -q d -t 0 -c /run/direwolf.tnc.conf | tee /home/pi/direwolf.log &

Dire Wolf version 1.6
Includes optional support for: gpsd hamlib cm108-ptt

Reading config file /run/direwolf.tnc.conf
Audio device for both receive and transmit: default (channel 0)
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 44100 sample rate / 3.
KM6LYW_Fix: waiting 500ms before intializing gpio because of udev lag.
KM6LYW_Fix: waiting 500ms before intializing gpio because of udev lag.
Ready to accept AGW client application 0 on port 8000 ...
Virtual KISS TNC is available on /dev/pts/1
Created symlink /tmp/kisstnc -> /dev/pts/1
Ready to accept KISS TCP client application 0 on port 8001 ...
Can't get address for IGate server noam.aprs2.net, Temporary failure in name resolution
Can't get address for IGate server noam.aprs2.net, Temporary failure in name resolution
[ig] KM6LYW-2>APDW16:!R:l%i/uLi& ! DigiPi http://craiger.org/digipi/
Can't get address for IGate server noam.aprs2.net, Temporary failure in name resolution

Now connected to IGate server noam.aprs2.net (44.25.16.4)
Check server status here http://44.25.16.4:14501

[ig] # aprsc 2.1.14-g5e22b37
[ig] # logresp KM6LYW-2 verified, server T2USANW

@craigerl
Copy link
Author

craigerl commented Oct 18, 2023

Using "-r 22000" also fixes the 100% cpu thread.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
 1962 pi        20   0  132268  11372   7900 S  13.6   2.3   0:33.53 direwolf 

Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 22000 sample rate.

edit:
44000 works (14% cpu)
44200 works (14% cpu)
56000 works (18% cpu)
192000 works (60%cpu)
44100 fails (100%cpu)
48000 fails (100% cpu)
48001 works (16% cpu)

anything but 44100 and 48000 seems to keep direwolf from consuming an entire cpu core
no correlation between frequency divider ( / 3 )

@dranch
Copy link
Collaborator

dranch commented Oct 18, 2023

Interesting.. btw, the Fe-Pi now the wb7fhc Nexus DR-X sound hat offers more sampling rates than most sound devices. You can get a full list of it's sampling rates by running:

sudo lsusb -vv | grep -e Audio -e tSamFreq | grep -v -e Descriptor -e bInterfaceClass

Anyway.. it's strange that the common sampling rates create high load for you on this truly stereo input / output sound device. We will have to wait to hear back from WB2OSZ on this one.

@craigerl
Copy link
Author

craigerl commented Oct 18, 2023

Thanks David, this isn't a USB device (I2C device), and yah, it has something to do with the supported hardware rates for sure.

32K seems to be a hardware rate as well
32000 (41% cpu)
32001 (23% cpu)

Running stack trace on the audio thread is even more revealing,

During normal cpu usage (44001 for example) we see device polling
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, -1) = 1 ([{fd=4, revents=POLLIN}]) <0.007488>

During high cpu usage (44100, 48000, 32000) we don't see the poll call at all, and we see a nanosleep instead
[pid 18648] 08:38:11 <... clock_nanosleep_time64 resumed>0x72dffd48) = 0 <1.000189>
[pid 18651] 08:38:11 <... ioctl resumed>, 0x1168080) = 0 <0.000179>
[pid 18648] 08:38:11 clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>

@dranch
Copy link
Collaborator

dranch commented Oct 18, 2023

One thing I noticed here is that the direwolf startup output is using the "default" audio device which is a PulseAudio device which I don't recommend. PulseAudio can do all kinds of unwanted audio processing things to the audio as well as doing silent resampling for non-native sound device sampling rates which will increase CPU load. In your direwolf.conf file, instead try specifying a native ALSA device name such as "ADEVICE plughw:1,0". Btw, also in your startup, I see you're using the shell environment variable "export ALSA_CARD=0" but I don't believe Direwolf uses that at all. If you're doing other ALSA tricks using DMIX, etc. please disable all of this and retest to see if that might be impacting Direwolf. Ultimately, Direwolf is a ALSA-native application and while it works, it's not a PulseAudio or PIpewire native application.

@craigerl
Copy link
Author

Understood.

No Pulse Audio -- ever.

libasound2 honors "ALSA_CARD", so direwolf does by extension.

I get the same result when specifying an audio device in direwolf.conf (ADEVICE plughw:0,0)

no dmix afaik

@craigerl
Copy link
Author

Increasing frames per period in audio.c resolves the issue.

        fpp = buf_size_in_bytes / (pa->adev[a].num_channels * pa->adev[a].bits_per_sample / 8);
//craiger - this normally comes out to 512
        fpp = 1024;

Unintended consequences, besides latency, unknown.

./src/direwolf -t 0 -r 44100
Dire Wolf version 1.7 (Oct 18 2023) BETA TEST 7
Includes optional support for: gpsd hamlib cm108-ptt

Reading config file direwolf.conf
Audio device for both receive and transmit: default (channel 0)
audio_open: calcbufsize (rate=44100, chans=1, bits=16) calc size=882, round up to 1024
suggest period size of 1024 frames
audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048
audio_open: calcbufsize (rate=44100, chans=1, bits=16) calc size=882, round up to 1024
suggest period size of 1024 frames
audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, A+, 44100 sample rate / 3.
Note: PTT not configured for channel 0. (Ignore this if using VOX.)
Ready to accept AGW client application 0 on port 8000 ...
Ready to accept KISS TCP client application 0 on port 8001 ...

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
31592 pi        20   0  105260  10716   7824 S  13.2   2.2   0:02.78 direwolf  

@wb2osz
Copy link
Owner

wb2osz commented Oct 20, 2023

I don't have one of these devices so I can't do any experiments. All I know is what is found here. https://github.com/NexusDR-X/nexus-audio/blob/main/README.md

Your test results are very perplexing.

44000 works (14% cpu)
44200 works (14% cpu)
56000 works (18% cpu)
192000 works (60%cpu)
44100 fails (100%cpu)
48000 fails (100% cpu)
48001 works (16% cpu)

direwolf treats the sound device very simplistically. I picked the default of 44100 because that is the CD rate and seems to be supported by all hardware. The USB audio devices also do 48000, which I believe is the DVD rate.

I don't know the internals of the audio device driver, but one might reasonably expect that it would instruct the hardware to operate at the requested sample rate if available. That would be most efficient with no rate conversion necessary. If the desired rate is not available in hardware, it would pick some higher rate, possibly an integer multiple or at least a rational number, if possible, to make rate conversion easier/more efficient.

That's not what we see at all in the test results. It could be something weird going on in the device driver.

I added a hack, oops, I mean heuristic, for ARM processors on the assumption they would be much much slower than x86 machines. Downsampling/decimation by a factor of 3 is applied if the sample rate is much higher than needed for good AFSK decoding. Nothing fancy; it just averages 'n' incoming samples together. An old single core RPi model 1 is more than adequate with this little trick and the user not having to do anything special. You will see this change to -D2 or disapear as the incoming rate decreases.

Trying to use ALSA to map two independent virtual mono audio channels adds another layer of complexity and strange possible cases. What happens if the two derived virtual channels want different rates. Would they fight over the hardware rates or it just use 96000 and burn up CPU resources downsampling it.

The buffer size is a tradeoff.
small buffer = rapid transfers =lower latency = more overhead.
and the opposite.
If the receive latency is too long, you will transmit on top of other people because recently started transmissions take too long to reach the decoder.

The specific direwolf version shouldn't matter. I don't think the audio input processing has been touched for a long time.

You mentioned bullseye. Have you tried bookworm. direwolf compiles OK (newer compile usually means new warnings) on the 64 bit version on an RPi 3 and passes the self tests. That's as far as I got. Maybe it has a better I2C audio driver.

Steve Magnuson, AG7GN should be brought into this conversation.

73,
john WB2OSZ

@craigerl
Copy link
Author

Thanks for the info.

If we're dividing sample rate by 3 on ARM, is there a downside to simply specifying a rate of "14700"? That yields 10% cpu on a Pi3 and would not introduce latency.

New observation, 90 percent of CPU time is spent in a system call (not user time) when using 44100/48000.

I'm still curious about nanosleep vs polling in the audio thread. Nanosleep is only seen when using 44100/4800 rates. We see poll() otherwise. It's like nanosleep doesn't sleep.

@craigerl
Copy link
Author

Bookworm observations. 32bit/lite, Direwolf 1.6, Linux 6.1.0

CPU usage is much better than bullseye, but significantly higher when using 44100 or 48000

44101 12% (10.0user, 2.3sys)
44100 20% ( 6.3user 12sys)
48000 20% (7.4user 12.8sys)
48001 16% (18user 0.3sys)

time direwolf -r44100
real 0m21.496s
user 0m1.348s
sys 0m2.736s

time direwolf -r 44101
real 0m21.922s
user 0m2.324s
sys 0m0.438s

Like with bullseye we still see nanosleeps when using 44100/48000, and we see polling with any other rate
strace -Ttf direwolf -r 44100
[pid 1147] 17:41:30 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=7, tv_nsec=0}, <unfinished ...>

strace -Ttf direwolf -r 44101
poll([{fd=6, events=POLLIN|POLLERR|POLLNVAL}], 1, -1) = 1 ([{fd=6, revents=POLLIN}])

Bullseye invokes clock_nanosleep_time64()
Bookworm invokes clock_nanosleep()

@craigerl
Copy link
Author

craigerl commented Oct 24, 2023

dbgsym version of alsa-libs compiled, this is where the direwolf process is spending time, still need to see which driver the "sys" time is being spent...

22.82% direwolf direwolf [.] demod_afsk_process_sample
20.77% direwolf libasound.so.2.0.0 [.] snd_pcm_route_convert1_many
7.64% direwolf libarmmem-v7l.so [.] memmove
6.29% direwolf direwolf [.] multi_modem_process_sample
4.53% direwolf direwolf [.] nudge_pll
3.62% direwolf direwolf [.] demod_process_sample
2.62% direwolf direwolf [.] audio_get

@craigerl
Copy link
Author

time being spent in snd_pcm driver, function snd_pcm_stream_unlock_irq()

44101

  18.49%  direwolf                  [.] demod_afsk_process_sample
  12.44%  libasound.so.2.0.0        [.] snd_pcm_route_convert1_many
  11.49%  [snd_pcm]                 [k] snd_pcm_stream_unlock_irq
   6.07%  libarmmem-v7l.so          [.] memmove
   5.22%  [kernel]                  [k] _raw_spin_unlock_irqrestore
   5.00%  direwolf                  [.] multi_modem_process_sample


44100

  78.11%  [snd_pcm]                 [k] snd_pcm_stream_unlock_irq
   2.83%  libasound.so.2.0.0        [.] snd_pcm_route_convert1_many
   2.66%  direwolf                  [.] demod_afsk_process_sample
   0.92%  libarmmem-v7l.so          [.] memmove
   0.82%  [snd_pcm]                 [k] snd_pcm_ioctl
   0.77%  [kernel]                  [k] _raw_spin_unlock_irqrestor

@craigerl
Copy link
Author

craigerl commented Oct 25, 2023

patch for audio.c, testing okay at this point. Bumping up the frames per period seems alleviate the snd_pcm driver problem with a minimal increase in latency. In fact, the buffer/period is similar to what it comes up with for a USB/ic705 device. CPU usage goes from 100% to 16% on Raspberry Pi Zero 2W with this patch.

pi@digipi4:~/git/direwolf/clean/direwolf/build $ ./src/direwolf -t 0 -r 48000
Dire Wolf version 1.7 (Oct 25 2023) BETA TEST 7
Includes optional support for: gpsd hamlib cm108-ptt dns-sd

Reading config file direwolf.conf
Audio device for both receive and transmit: default (channel 0)
audio_open: calcbufsize (rate=48000, chans=1, bits=16) calc size=1152, round up to 2048
suggest period size of 1024 frames
audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048
audio_open: calcbufsize (rate=48000, chans=1, bits=16) calc size=1152, round up to 2048
suggest period size of 1024 frames
audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, A+, 48000 sample rate / 3.
Note: PTT not configured for channel 0. (Ignore this if using VOX.)
Ready to accept AGW client application 0 on port 8000 ...
Ready to accept KISS TCP client application 0 on port 8001 ...
DNS-SD: Avahi: Announcing KISS TCP on port 8001 as 'Dire Wolf on digipi4'
DNS-SD: Avahi: Service 'Dire Wolf on digipi4' successfully registered.

pi@digipi4:~/git/direwolf/direwolf/build $ sudo cat /proc/asound/card0/pcm0p/sub0/hw_params
access: MMAP_INTERLEAVED
format: S16_LE
subformat: STD
channels: 2
rate: 48000 (48000/1)
period_size: 1024
buffer_size: 131072

audio.c

//craiger nudge up buffer size until frames per period hits 1024 for i2s cards
//#define ONE_BUF_TIME 10
#define ONE_BUF_TIME 12
#if __arm__
        /* Ugly hack for RPi. */
        /* Reducing buffer size is fine for input but not so good for output. */

//craiger don't inflate output buffer size
//      if (*inout == 'o') {
//        buf_size_in_bytes = buf_size_in_bytes * 4;
//      }
#endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants