Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flash alignment mismatch leads to large overhead with Zephyr mcumgr #185

Open
sw opened this issue May 6, 2021 · 8 comments
Open

Flash alignment mismatch leads to large overhead with Zephyr mcumgr #185

sw opened this issue May 6, 2021 · 8 comments
Assignees

Comments

@sw
Copy link

sw commented May 6, 2021

We are using Nordic's fork of mcumgr (https://github.com/nrfconnect/sdk-mcumgr) but I believe this also affects upstream https://github.com/apache/mynewt-mcumgr.

The Zephyr port rounds the write size down to the flash write block size:
https://github.com/apache/mynewt-mcumgr/blob/798f7fe28bb32335ba25d1c180808cd88d73a83c/cmd/img_mgmt/port/zephyr/src/zephyr_img_mgmt.c#L571-L576

This smaller size is then reported back in the response. However, the mcumgr CLI takes some time to detect this mismatch, and keeps sending frames with incorrect offsets.

This leads to a large overhead of data transmitted on the serial port.

This happens with the first frame, which is sent with 0x127 bytes of data; the receiving MCU rounds this down to 0x124. mcumgr CLI then tries to continue at offset 0x127, then offset 0x27b, until it realizes something is wrong and restarts at 0x124.

The next 64kbyte then go through without a hitch (no offset problems).

At the 64k block border though, the problem re-occurs, and from then on every fragment is essentially sent 4 times: 3 times with wrong offsets which are discarded. A firmware upgrade thus takes approximately 4 times longer than would be necessary!

The image handling in newtmgr should be improved:

  • to detect a wrong offset sooner, rather than sending up to 3 frames with wrong offset
  • to only send a multiple of a common flash block size, for example a multiple of 4 bytes.

Attached is a log file from mcumgr image upload file.bin -l debug:
log.txt

@sw
Copy link
Author

sw commented May 7, 2021

This seems to have gotten worse in 1.9.0 compared to 1.8.0

@fs34a
Copy link

fs34a commented Aug 9, 2021

I found the exact the same issue. My solution is to round the chunk length as 4-byte-aligned, then bootloader won't wait for valid offset check. It squeezed the time from minutes to 17 sec for 340KB binary.
diff.patch.txt

@dachalco
Copy link

+1

@sw
Copy link
Author

sw commented Sep 16, 2021

Using

writeLen := util.Min(512, totlen-written)

seems to be optimal, as the MTU size is also limited elsewhere. The value 1024 suggested by @LockeKK will therefore not result in larger frames.

This requires setting SMP_SHELL_RX_BUF_SIZE to 512 on Zephyr.

This also fixes the issue with the 20ms delay as reported in #180, as that Sleep call no longer happens.

jeppefrandsen added a commit to AudioStreamingPlatform/mynewt-newtmgr that referenced this issue Apr 25, 2022
jeppefrandsen added a commit to AudioStreamingPlatform/mynewt-newtmgr that referenced this issue Apr 25, 2022
@jeppefrandsen
Copy link

jeppefrandsen commented Apr 26, 2022

+1. Used the patch from @LockeKK for now 🍻

@sjanc sjanc self-assigned this Apr 26, 2022
@FARLY7
Copy link

FARLY7 commented Jul 13, 2022

+1 for the patch provided by @LockeKK 👍🏻

I was able to reduce the upload times down from 3.5mins to 44s:

Original at 115,200 baud:
247.23 KiB / 247.23 KiB [===============================] 100.00% 1.14 KiB/s 3m37s
with patch:
247.23 KiB / 247.23 KiB [===============================] 100.00% 5.56 KiB/s 44s

For me, this issue was only occurring while using Serial Recovery mode of MCUBoot, it was not occurring when updating the image using application-level SMP server. I guess this has something to do with the flash alignment of the primary slot vs secondary slot. Serial Recovery mode writes directly to primary slot, where as application-level updates are written to the secondary slot.

Another interesting point for those running macOS. The serial_posix.go file in the Tarm serial library used by Mcumgr CLI tool limits the baud rate to 115,200, which is a standard POSIX limit. We also created a patch that removed the use of constants to define the baud rate and instead use the absolute baud rate value. This allowed us to run baud rates beyond 115,200. The highest we tested was 921,600 baud, bringing the upload times further down to 11s! A massive improvement from 3.5mins.

@FARLY7
Copy link

FARLY7 commented Sep 2, 2022

Adding to my initial comment, adding the patch provided by @LockeKK causes image upload to hang when using SMP Server with USB CDC ACM.

I haven't been able to figure out why, but I figured it must be something to do with MTU sizes, however, I have tried every MTU-like KConfig option I can find and cannot get it to work.

There are serious reliability issues with this mcumgr solution that need to be addressed. For a feature that is so fundamental to Zephyr, it's saddening that it works so poorly.

Without the patch, it should not take 1min and 47s to upload a 340kB image over USB, and 3.5mins over UART. Adding in the 45s it takes for MCUBoot to perform the image swap on reset. We're talking a few minutes to update a small image. The ESP32 serial bootloader can update a much larger image in a fraction of this time.

@mlaz
Copy link

mlaz commented Sep 2, 2022

Adding to my initial comment, adding the patch provided by @LockeKK causes image upload to hang when using SMP Server with USB CDC ACM.

@FARLY7 I believe I know what is causing that hang. Let me investigte it.

There are serious reliability issues with this mcumgr solution that need to be addressed. For a feature that is so fundamental to Zephyr, it's saddening that it works so poorly.

Can you please elaborate on that? Can you maybe open an issue on it, if it is something out of the scope of this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants