Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP8266 audioreactive UDP sync ported from MoonModules/WLED #3962

Draft
wants to merge 4 commits into
base: 0_15
Choose a base branch
from

Conversation

gaaat98
Copy link

@gaaat98 gaaat98 commented May 8, 2024

Tries to resolve #3960

However, it seems to be unstable on RGB(W) PWM strips at higher clock speeds, causing frequent reboots

@netmindz
Copy link
Contributor

netmindz commented May 8, 2024

As I might not have an ESP8266 to hand when doing the code review, if you could please include a screenshot of the Audio Reactive settings, including the sync dropdown being expanded please

@gaaat98
Copy link
Author

gaaat98 commented May 8, 2024

I have left the default ones

immagine

@softhack007 softhack007 self-requested a review May 8, 2024 21:13
@softhack007 softhack007 linked an issue May 8, 2024 that may be closed by this pull request
@softhack007
Copy link
Collaborator

softhack007 commented May 9, 2024

Screenshot_20240509-133320_Samsung Internet

@blazoncek question as I'm not very experienced with 8266 - sync receive polls for new UDP multicast packets each 20ms. This works well on esp32.

Maybe 8266 can have a problem with it? Do you think that adding 'yield()' before/after UDP calls might help?

Or maybe it's the frequent alloc/dealloc of fftBuff[packetsize] that causes stress on 8266 due to heap fragmentation? In fact a static buffer of 88 bytes (max xpected packet size) would be enough.

@blazoncek
Copy link
Collaborator

I think @willmmiles or @pbolduc will know much more about AsyncUDP/TCP than me.

fftBuff is allocated on stack as far as I can see. That shouldn't be a problem. @gaaat98 please provide crash dump using exception decoder with your WDT crash.

@blazoncek blazoncek added enhancement usermod usermod related labels May 9, 2024
@gaaat98
Copy link
Author

gaaat98 commented May 9, 2024

Excuse the dumb question, but how do I enable stack trace printing? The only thing I see on serial while resets occur is this:

Loop delayed more than 3ms.
Loop delayed more than 3ms.

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 3424, room 16
tail 0
chksum 0x2e
load 0x3fff20b8, len 40, room 8
tail 0
chksum 0x2b
csum 0x2b
v000e0460
~ld
���'�{��n<�$�l b�|{�l�o��o�l ��{�d�l

�
---WLED 0.15.0-b3 2405030 INIT---
esp8266 @ 80MHz.
Core: 3.1.2
FLASH: 2 MB
heap 29072
PIN ALLOC: Pin 1 successfully allocated by 0x89 (137)
Registering usermods ...
heap 28984

@blazoncek
Copy link
Collaborator

-D WLED_DEBUG

(which you have) and

monitor_filters = esp8266_exception_decoder

in your PIO environment.

But perhaps 8266 does not support WDT exceptions and does not print stack dump.
It looks to me that it may be possible that there is some loop that iterates across too many iterations (like -1 becoming 65535 when doing int16_t conversion to uint16_t).

@pbolduc
Copy link
Contributor

pbolduc commented May 10, 2024

One of the biggest issues is the Espressif networking stack has problems with high UDP traffic rates. It has been a while since I looked at the networking. I found this issue espressif/arduino-esp32#4104 which talks about errors with the parsePacket function. I need a reminder where the WifiUDP library comes from in WLED.

@willmmiles
Copy link
Contributor

I need a reminder where the WifiUDP library comes from in WLED.

WifiUDP is part of the Arduino framework on ESP8266 and ESP32. (The E131 receiver elsewhere in the project is using the ESPAsyncUDP library, which is not part of the framework.)

My quick read of the WifiUDP framework code on ESP8266 suggests it's reasonably well written. By contrast, the ESP32 version is awful, heap allocating and freeing full 1460 byte packet buffer every time it polls to see if there's a new packet(!!).

I pulled this PR in to my web server stability working branch and I haven't been able to reproduce the fault yet. It's neat to see the AR effects working on an 8266 again, though it's noticably laggy compared to the other ESP8266 running as a DDP segment beside it.

@gaaat98 would you mind posting your config, in case there's something non-obvious I've got configured differently?

@softhack007
Copy link
Collaborator

softhack007 commented May 11, 2024

I pulled this PR in to my web server stability working branch and I haven't been able to reproduce the fault yet

@willmmiles I think the problem is is happening with analog (PWM) strips only - may also happening with bitbang ws2812 drivers. So I suspects it's a timing issue, or simply the 8266 is overloaded when doing high freq PWM (interrupt driven) + UDP polling + WiFi + other stuff.

Besides UDP polling, audioreactive runs a linear filter (no loops) then simply copies received data to the variables used in effects. So audioreactive on 8266 does "almost nothing" in addition to UDP.

@gaaat98
Copy link
Author

gaaat98 commented May 11, 2024

@willmmiles Here's the configuration on my athom controller (esp8285), on this board the crashes happen after a few updates when the clock speed is set to values higher than 'Slowest' (the effect I tested was DJ Light but I assume also the others will do the same).

This is the config of a ESP8266 board I used for testing, here the crashes are a bit less frequent but still happen. I noticed that if I open the /liveview page on a browser the number of reboots noticeably increases, which suggests that the boards crash due to overload.

These are both configured as PWM strips, I also have another athom controller connected to SK6812 strip with UDP sync enabled that shows no such issues

@blazoncek
Copy link
Collaborator

@gaaat98 please get a plain D1 mini board, configure it as an Athom controller and monitor serial output.
It is important if the crashes are caused by watchdog or something else.

@gaaat98
Copy link
Author

gaaat98 commented May 11, 2024

@blazoncek That's basically what I did to obtain this log, the only difference is that I used a NodeMCU ESP8266 instead of D1 Mini, is that relevant?

Loop delayed more than 3ms.
Loop delayed more than 3ms.

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 3424, room 16
tail 0
chksum 0x2e
load 0x3fff20b8, len 40, room 8
tail 0
chksum 0x2b
csum 0x2b
v000e0460
~ld
���'�{��n<�$�l b�|{�l�o��o�l ��{�d�l

�
---WLED 0.15.0-b3 2405030 INIT---
esp8266 @ 80MHz.
Core: 3.1.2
FLASH: 2 MB
heap 29072
PIN ALLOC: Pin 1 successfully allocated by 0x89 (137)
Registering usermods ...
heap 28984

Before this Serial only showed other Loop delayed more than 3ms. and other generic debug info that was printed periodically.

The configuration used is the one I uploaded earlier:

This is the config of a ESP8266 board I used for testing, here the crashes are a bit less frequent but still happen. I noticed that if I open the /liveview page on a browser the number of reboots noticeably increases, which suggests that the boards crash due to overload.

I repeated that once again and collected other logs here: crash-log.txt, clock speed is set to 'Fastest', WLED crashes after a while if I set to effect DJ Light from the Android app (the NodeMCU is receiving sound data from an esp32 which is listening to music), or crashes immediately if I enable the Peek option from the app.

EDIT: I was flashing the 2MB flash version because that's the one used on the athom controllers, so I also tried the standard NodeMCU build and the issue is still present

@willmmiles
Copy link
Contributor

I pulled this PR in to my web server stability working branch and I haven't been able to reproduce the fault yet

@willmmiles I think the problem is is happening with analog (PWM) strips only - may also happening with bitbang ws2812 drivers. So I suspects it's a timing issue, or simply the 8266 is overloaded when doing high freq PWM (interrupt driven) + UDP polling + WiFi + other stuff.

I tried both bitbang ws2812 (actually my default ESP8266 test, due to a poor pin selection in a project I built before I discovered WLED!) and RGBW PWM in "normal" speed. No crashes for me after 12h of operation.

Besides UDP polling, audioreactive runs a linear filter (no loops) then simply copies received data to the variables used in effects. So audioreactive on 8266 does "almost nothing" in addition to UDP.

I suspect the polling latency might be the source of the visible lag. On 8266 calling the poll looks to be "basically free" in terms of CPU usage if there's nothing pending; we could probably poll every loop with no impact there (not so much on ESP32, though!!). Stability is more important than performance, though, so I'll investigate this later.

@willmmiles
Copy link
Contributor

clock speed is set to 'Fastest', WLED crashes after a while if I set to effect DJ Light from the Android app

Aha, there it is! Just got one with these settings.

@gaaat98
Copy link
Author

gaaat98 commented May 11, 2024

@willmmiles during your test with the PWM strip did you select a sound reactive effect on the receiving controller? Did you have another device sending the audio data? To have the sound reactive effects you need to make sure you pulled PR#3942 otherwise they are disabled.

Sorry to ask but it's stange since I have almost immediate crashes

@willmmiles
Copy link
Contributor

@willmmiles during your test with the PWM strip did you select a sound reactive effect? Did you have another device sending the audio data? To have the sound reactive effects you need to make sure you pulled #3942 otherwise they are disabled.

Yes and yes. For the initial testing I had the same effect (GravCentric) running on three WS2812 strips - one on the ESP32 AR node directly, one on an ESP8266 over DDP, and one on an ESP8266 using AR sync. That's how I could tell the AR sync was laggy compared to DDP. ;)

For the overnight test I reconfigured that last unit to add PWM output on some unconnected pins. It ran OK for me using the GravCentric effect at 'Normal' speed. Dunno if it was the speed, the DJ light effect, or peek, but it faulted right away when I set it up with that config ... though it hasn't faulted since.

@blazoncek
Copy link
Collaborator

---WLED 0.15.0-b3 2405030 INIT---
esp8266 @ 80MHz.

@gaaat98 try 160MHz builds.

@gaaat98
Copy link
Author

gaaat98 commented May 11, 2024

Same behaviour, I'm trying to establish reliable steps to reproduce the crashes but at the moment it seems that for me is enough to have the ESP8266 receiving data, the Dj Lights effect active, clock set to Fastest, and then just wait for maybe less than a minute while making some noises near the sending device...

Logs for the 160MHz build here

@softhack007 softhack007 self-assigned this May 15, 2024
@willmmiles
Copy link
Contributor

I managed to get the crash to reliably reproduce, but I haven't yet pinpointed the root cause. Here's what I've learned so far - might be old hat for some of you, sorry I'm still new to this environment:

  • I was able to reproduce the crash using non-AR FX ('Breathe' in particular) on the PWM segment, so long as the brightness got above a certain point
  • It won't reproduce without the UDP traffic, though
  • -D DEBUG_ESP_HWDT gives a more helpful crash dump on WDT timer crashes such as these, but the user and system stacks are all munged together
  • -D DEBUG_ESP_HWDT_NOEXTRA4K gives clean stack traces, but costs 4K of RAM
  • Adding DEBUG_PRINT(F("Reset for: ")); DEBUG_PRINTLN(ESP.getResetInfo()); to WLED::setup() gives yet more clues to what was going on when the system faulted
  • None of these have left me with a smoking gun, or at least I haven't isolated it yet - the faulting instruction pointers seem to be all over the system and frequently "benign" (eg. register arithmetic, nothing that would obviously lead to a fault).
  • .. though there's a frequent occurrence of a ROM function at the top of the system stack, ram_rxiq_get_mis, which I am guessing is for retrieving packets from the wifi module??

I'm on vacation and away from home for the next couple of weeks, so I won't be able to do any more digging until I get back, but I wanted to leave some tracks in case anyone else was investigating.

Sorry I don't have more answers!

@flux242
Copy link

flux242 commented May 21, 2024

please condisder the following implementation of the receiveAudioData function:

  • The buffer is static now and its size is 83 bytes (size of the v1 packet)
  • Fixed possible stack crash attack if somebody would send big UDP packets
  • Added a while loop to make sure that if packets are sent too often it wouldn't cause out of ram crash.
    bool receiveAudioData()   // check & process new data. return TRUE in case that new audio data was received. 
    {
      if (!udpSyncConnected) return false;
      bool haveFreshData = false;

      size_t packetSize = 0u;
      while ( (packetSize = fftUdp.parsePacket()) !=0 )
      {
        if ( (packetSize == sizeof(audioSyncPacket)) || (packetSize == sizeof(audioSyncPacket_v1)) ) {
          //DEBUGSR_PRINTLN("Received UDP Sync Packet");
          static uint8_t fftBuff[sizeof(audioSyncPacket_v1)];
          fftUdp.read(fftBuff, packetSize);

          // VERIFY THAT THIS IS A COMPATIBLE PACKET
          if ( isValidUdpSyncVersion((const char *)fftBuff)) {
            decodeAudioData(packetSize, fftBuff);
            //DEBUGSR_PRINTLN("Finished parsing UDP Sync Packet v2");
            haveFreshData = true;
            receivedFormat = 2;
          }
          else if (isValidUdpSyncVersion_v1((const char *)fftBuff)) {
            decodeAudioData_v1(packetSize, fftBuff);
            //DEBUGSR_PRINTLN("Finished parsing UDP Sync Packet v1");
            haveFreshData = true;
            receivedFormat = 1;
          }
          else receivedFormat = 0; // unknown format
        }
      }
      return haveFreshData;
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement usermod usermod related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audio Data UDP Sync (receive only) on ESP8266
7 participants