Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Link Time Optimisation #3824

Draft
wants to merge 3 commits into
base: 0_15
Choose a base branch
from
Draft

Enable Link Time Optimisation #3824

wants to merge 3 commits into from

Conversation

deece
Copy link

@deece deece commented Mar 15, 2024

ref: https://gcc.gnu.org/wiki/LinkTimeOptimization

Link Time Optimisation defers final compilation until the link stage, which allows more aggressive inlining and optimisations to occur, as more information is known at link time vs when each object is compiled.

For the esp32dev platform, I have the following size reductions:
Without LTO:
1313024 .pio/build/esp32dev/firmware.bin

With LTO:
1214256 .pio/build/esp32dev/firmware.bin

ref: https://gcc.gnu.org/wiki/LinkTimeOptimization

Link Time Optimisation defers final compilation until the link
stage, which allows more aggressive inlining and optimisations
to occur, as more information is known at link time vs when
each object is compiled.

For the esp32dev platform, I have the following size reductions:
  Without LTO:
    1313024 .pio/build/esp32dev/firmware.bin

  With LTO:
    1214256 .pio/build/esp32dev/firmware.bin

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
These require investigation:
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: libc_replacements.cpp.o (symbol from plugin): in function `_open_r':
(.text+0x0): multiple definition of `puts'; /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/lib/libc.a(lib_a-puts.o):/workdir/repo/newlib/newlib/libc/stdio/puts.c:129: first defined here
/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_wiring_digital.cpp:129:3: warning: type 'struct ArgStructure' violates the C++ One Definition Rule [-Wodr]
  129 | } ArgStructure;
      |   ^
/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/FunctionalInterrupt.h:26:8: note: a different type is defined in another translation unit
   26 | struct ArgStructure {
      |        ^
/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_wiring_digital.cpp:128:8: note: the first difference of corresponding definitions is field 'functionInfo'
  128 |  void* functionInfo;
      |        ^
/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/FunctionalInterrupt.h:28:16: note: a field of same name but different type is defined in another translation unit
   28 |  FunctionInfo* functionInfo = nullptr;
      |                ^
/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_wiring_digital.cpp:129:3: note: type 'void' should match type 'struct FunctionInfo'
  129 | } ArgStructure;
      |   ^
/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/FunctionalInterrupt.h:21:8: note: the incompatible type is defined here
   21 | struct FunctionInfo {
      |        ^
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: .pio/build/nodemcuv2/firmware.elf section `.text1' will not fit in region `iram1_0_seg'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o:(.text+0x4): undefined reference to `stack_thunk_save'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o:(.text+0x8): undefined reference to `SigningVerifier_verify'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_SigningVerifier_verify':
<artificial>:(.text+0x1f): undefined reference to `SigningVerifier_verify'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: <artificial>:(.text+0x32): undefined reference to `stack_thunk_fatal_smashing'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: <artificial>:(.text+0x72): undefined reference to `stack_thunk_fatal_smashing'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_br_ssl_engine_recvapp_ack':
<artificial>:(.text+0xb2): undefined reference to `stack_thunk_fatal_smashing'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_br_ssl_engine_recvapp_buf':
<artificial>:(.text+0xf2): undefined reference to `stack_thunk_fatal_smashing'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_br_ssl_engine_recvrec_ack':
<artificial>:(.text+0x132): undefined reference to `stack_thunk_fatal_smashing'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o:<artificial>:(.text+0x172): more undefined references to `stack_thunk_fatal_smashing' follow
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `__wrap_system_restart_local':
<artificial>:(.text.__wrap_system_restart_local+0x2): undefined reference to `postmortem_report'
/home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans3.ltrans.o: in function `_ZN8Espalexa18handleAlexaApiCallEP21AsyncWebServerRequest':
<artificial>:(.text+0x3c68): undefined reference to `_ZN23NeoEsp8266I2sMethodCore15c_StateDataSizeE'
collect2: error: ld returned 1 exit status
*** [.pio/build/nodemcuv2/firmware.elf] Error 1

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
@softhack007
Copy link
Collaborator

softhack007 commented Mar 15, 2024

Hi,

I've played with LTO some time ago, and actually ran into some issues:

  • it did not seem to be reliable / stable on all supported devices (especially -C3 had problems)
  • performance had dropped in some cases.
  • stack usage is increasing - this could become a problem for the async_tcp task (web server) where stack usage is already critical
  • stack traces (crash dumps) become unreadable when lto is enabled
  • no benefit on 8266

As there seem to be some drawbacks, it would be good if you make LTO optional.

@softhack007
Copy link
Collaborator

This might be interesting, too platformio/platform-espressif32#702

@deece
Copy link
Author

deece commented Mar 15, 2024

This might be interesting, too platformio/platform-espressif32#702

That's weird, that's the same change that I put a PR in for this morning. I wonder if it's been backed out or clobbered at some point.

[later] Ah, it was never submitted, then the OP pulled the patchset and presumably it was forgotten about.

@softhack007
Copy link
Collaborator

Another thing that I remember: the arduino-esp32 build scripts have an explicit "-fno-lto"

https://github.com/espressif/arduino-esp32/blob/75b7f4b646599426e92ebec913a4dce14ef528cd/tools/platformio-build-esp32.py#L84

so you might also need build_unflags = -fno-lto to remove the no-lto flag from the command line.

@deece
Copy link
Author

deece commented Mar 15, 2024

Another thing that I remember: the arduino-esp32 build scripts have an explicit "-fno-lto"

https://github.com/espressif/arduino-esp32/blob/75b7f4b646599426e92ebec913a4dce14ef528cd/tools/platformio-build-esp32.py#L84

so you might also need build_unflags = -fno-lto to remove the no-lto flag from the command line.

Interesting, that looks like it snuck in this commit, with no explanation:
espressif/arduino-esp32@5502879

I just added it to build_unflags and saved a little more:
1212336 .pio/build/esp32dev/firmware.bin

Do you have any references for the a loss in performance? (that should be repro'd with a current compiler & reported upstream) AFAIK, LTO eliminates call overhead and dead branches, which should only ever improve performance.

@softhack007
Copy link
Collaborator

Do you have any references for the a loss in performance? (that should be repro'd with a current compiler & reported upstream) AFAIK, LTO eliminates call overhead and dead branches, which should only ever improve performance

Nothing that's good for a test case, sorry. It was just something I recognized sporadically, when playing with lto and other optimization flags.

Actually this guy makes a similar statement:
https://interrupt.memfault.com/blog/best-and-worst-gcc-clang-compiler-flags#-flto

The other problem is debugging and stack traces - these become very unreadable due to lto. However its possible to still get something usefull if compiling with -g3 -ggdb.

@deece
Copy link
Author

deece commented Mar 15, 2024

* it did not seem to be reliable / stable on all supported devices (especially -C3 had problems)

I just rolled a build and threw it onto a C3 I had spare, and there was no sign of the boot loop mentioned in the other thread. I'd put it down to a> ancient compiler and b> relatively new architecture.

It looks like the compiler we are currently using generates reasonable code for the C3. I was able to boot it any join it to my Wifi network. (there's no headers on it though so I can't test with a strip)

Do you have any suggestions on how to make things optional? I can't see a nice way to conditionally fiddle the flags.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
@Jason2866
Copy link
Contributor

Jason2866 commented Mar 17, 2024

We are using LTO since over 1 year for Tasmota (except for the C3). Flash size usage reduction is nice.
Yes, debugging is a pain when enabled. But LTO can be switched of when needed.
We have ZERO issues with LTO. Tasmota is a stable as it was before.
Not used with esp8266. With upcoming Arduino core 3.0.0 it is possible to use with the C3 too. Since newer toolchains used, the gcc compiler / linker bug regarding LTO is fixed.
To be clear every espressif toolchain for riscv before gcc 12 produces faulty code for the c3 when enabling LTO!

And yes there are articles floating around, how bad it is to use LTO. Looking closely all issues arriving with enabling LTO are bad code design.

@deece
Copy link
Author

deece commented Mar 18, 2024

We are using LTO since over 1 year for Tasmota (except for the C3). Flash size usage reduction is nice. Yes, debugging is a pain when enabled. But LTO can be switched of when needed. We have ZERO issues with LTO. Tasmota is a stable as it was before. Not used with esp8266. With upcoming Arduino core 3.0.0 it is possible to use with the C3 too. Since newer toolchains used, the gcc compiler / linker bug regarding LTO is fixed. To be clear every espressif toolchain for riscv before gcc 12 produces faulty code for the c3 when enabling LTO!

@Jason2866 Is your recommendation to disable LTO for C3 for now? It looks like the build selects Espressif32 @ 5.3.0, which selects toolchain-riscv32-esp 8.4.0+2021r2-patch5.

@Jason2866
Copy link
Contributor

@deece Yes, disable LTO for the C3 when toolchain-riscv32-esp 8.4.0+2021r2-patch5 is used.

@softhack007
Copy link
Collaborator

softhack007 commented Mar 18, 2024

Do you have any suggestions on how to make things optional? I can't see a nice way to conditionally fiddle the flags.

I think the script that changes elf-ar into elf-gcc-ar could always be kept active. So people building their own firmware, and developers need a way to "opt-in" to link-time optimization.

You could achieve it with a new section like this

[Size_Flags]
build_flags =
  -flto ;; enable LTO
build_unflags =
  -fno-lto ;; remove no-LTO

and then in custom buildenvs, we could simply add ${Size_Flags.build_unflags} to build_unflags and ${Size_Flags.build_flags} to build_flags respectively, to opt-in for size optimization by lto - knowing that lto makes stackdumps unreadable.

@softhack007
Copy link
Collaborator

softhack007 commented Mar 18, 2024

@deece ... so instead of modifying the [common] section, it seems more appropriate to have the lto flags/unflags one level deeper, in these sections:

[esp32]

[esp32_idf_V4]

[esp32s2]

[esp32s3]

@softhack007 softhack007 marked this pull request as draft March 18, 2024 23:34
@Jason2866
Copy link
Contributor

For Tasmota we enable LTO global for alle ESP32x and disable it individually in the C3 envs.

@blazoncek
Copy link
Collaborator

Thank you @Jason2866 for selflessly contributing your knowledge. ❤️

@softhack007 would it make sense to include LTO for every ESP32 except C3? We can make default build_flags for each of the environments IMO.

@deece could you update build_flags in each of the [esp32xx] environments accordingly if @softhack007 agrees? And perhaps make [esp32xx_debug] environments if needed?

BTW this is beyond my skill level so I cannot comment or review anything.

@@ -0,0 +1,24 @@
Import('env')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short comment here why this script exists and what it does on a high level (I assume it replaces some parts of the linker with another that supports LTO?)

@blazoncek
Copy link
Collaborator

Hi. I think it would be good to pursue this PR further as we are really at the limits of binary size for currently selected partitions.

Unfortunately my knowledge is limited and I cannot be of any help here.

@deece @softhack007 as you are more experienced could you work together (and if @Jason2866 can help) to make it happen?

IMO as @Jason2866 recommended we should enable LTO across all ESP32 except C3 (which only has 1 build environment) and add esp32 build flags to optionally disable it as @softhack007 suggested.

@deece
Copy link
Author

deece commented Apr 28, 2024 via email

build_flags_esp8266 = ${common.build_flags} ${esp8266.build_flags}
build_flags_esp32 = ${common.build_flags} ${esp32.build_flags}
build_flags_esp32_V4= ${common.build_flags} ${esp32_idf_V4.build_flags}
build_flags_esp32 = ${common.build_flags} ${esp32.build_flags} ${common.lto_flags}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer present in current platformio.ini so LTO flags should be moved into appropriate [esp32xx] sections.
I would also create environment (or better just flags) for debug builds. Enabling all debug flags (as specified in [common] section) will no longer link successfully when using anything but plain binary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platformio.ini.gz

This is the path I've been going down. I'll rebase against whatever the current development branch is when I get back to it.

Basically, every platform will have a normal (undecorated) and debug environment, with (LTO|None) or DEBUG flags applied as appropriate for the platform.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this when making base environments:

[env:wemos_d1_mini32]
board = esp32dev
platform = ${esp32.platform}
;board_build.f_flash = 80000000L  # does not work on every ESP
board_build.flash_mode = qio
board_build.partitions = ${esp32.default_partitions}
board_build.filesystem = littlefs
monitor_filters = esp32_exception_decoder
build_unflags = ${common.build_unflags}
build_flags = ${common.build_flags} ${esp32.build_flags} -D WLED_RELEASE_NAME=WEMOS32
;  -D WLED_DISABLE_HUESYNC
;  -D WLED_DISABLE_LOXONE
;  -D WLED_DISABLE_ALEXA
  -D WLED_DISABLE_BROWNOUT_DET
  -D WLED_NTP_ENABLED=true
  -D WLED_TIMEZONE=2
  -D WLED_LAT=45.8
  -D WLED_LON=15.17
  -D LEDPIN=16
  -D RLYPIN=19
  -D BTNPIN=17
  -D IRPIN=18
  -D USERMOD_DALLASTEMPERATURE
  -D TEMPERATURE_PIN=23
  -D USERMOD_FOUR_LINE_DISPLAY
  -D USERMOD_PIRSWITCH
  -D PIR_SENSOR_PIN=26
  -D PIR_SENSOR_OFF_SEC=60
  -D PIR_SENSOR_MAX_SENSORS=2
  -D USERMOD_MULTI_RELAY
  -D MULTI_RELAY_MAX_RELAYS=3
  -D USERMOD_AUDIOREACTIVE
  -D DMTYPE=1     # 0-analog/disabled, 1-I2S generic, 2-ES7243, 3-SPH0645, 4-I2S+mclk, 5-I2S PDM
  -UWLED_USE_MY_CONFIG
lib_deps = ${esp32.lib_deps}
  olikraus/U8g2 # @~2.33.15
  paulstoffregen/OneWire@~2.3.8
  Wire
  ${esp32.AR_lib_deps}

[env:wemos_d1_mini32_debug]
extends = env:wemos_d1_mini32
extra_scripts = ${env.extra_scripts}
  replace_fs.py
build_flags = ${common.build_flags} ${esp32.build_flags}
;  -DBOARD_HAS_PSRAM -mfix-esp32-psram-cache-issue ;; this adds approximately 99k to the binary size
; DEBUG reserves GPIO1
  -D WLED_DEBUG
;  -D WLED_DEBUG_HOST='"192.168.x.x"'
;  -D WLED_DEBUG_MATH
;  -D WLED_USE_UNREAL_MATH
;  -D WLED_USE_ETHERNET
;  -D WLED_ETH_DEFAULT=5
;  -D WLED_DISABLE_MQTT
;  -D WLED_DISABLE_INFRARED
  -D WLED_DISABLE_ALEXA
  -D WLED_DISABLE_HUESYNC
  -D WLED_DISABLE_LOXONE
;  -D WLED_DISABLE_MODE_BLEND
;  -D WLED_ENABLE_DMX
;  -D WLED_ENABLE_PIXART
;  -D WLED_DISABLE_PXMAGIC
;  -D WLED_DISABLE_WEBSOCKETS
;  -D WLED_DISABLE_2D
  -D WLED_AP_SSID_UNIQUE
  -D WLED_NTP_ENABLED=true
  -D WLED_TIMEZONE=2
  -D WLED_LAT=45.8
  -D WLED_LON=15.17
;  -D LEDPIN=16
  -D DATA_PINS=16,3
  -D PIXEL_COUNTS=30,30
  -D RLYPIN=19
  -D BTNPIN=17
  -D IRPIN=18
;  -D WLED_MAX_BUTTONS=12
;  -D USERMOD_SHT
  -D USERMOD_DALLASTEMPERATURE
  -D TEMPERATURE_PIN=23
  -D USERMOD_FOUR_LINE_DISPLAY
  -D USERMOD_ROTARY_ENCODER_UI # 33,34,35
  -D ENCODER_DT_PIN=33
  -D ENCODER_CLK_PIN=34
  -D ENCODER_SW_PIN=35
;  -D USERMOD_AUTO_SAVE
;  -D AUTOSAVE_AFTER_SEC=90
;  -D USERMOD_ANIMATED_STAIRCASE
  -D USERMOD_PIRSWITCH
  -D PIR_SENSOR_PIN=36
  -D PIR_SENSOR_OFF_SEC=60
  -D PIR_SENSOR_MAX_SENSORS=2
  -D USERMOD_MULTI_RELAY  # 14, 15, 32
  -D MULTI_RELAY_MAX_RELAYS=2
;  -D MULTI_RELAY_PINS=14,15
;  -D USERMOD_PWM_FAN
;  -D USERMOD_BOBLIGHT
;  -D USERMOD_POWER_AP
  -D USERMOD_AUDIOREACTIVE
  -D AUDIOPIN=-1  # analog pin
  -D DMTYPE=1     # 0-analog/disabled, 1-I2S generic, 2-ES7243, 3-SPH0645, 4-I2S+mclk, 5-I2S PDM
  -D I2S_SDPIN=4
  -D I2S_WSPIN=32
  -D I2S_CKPIN=12
;  -D STATUSLED=2
  -D WLED_WATCHDOG_TIMEOUT=10
  -D WLED_DISABLE_BROWNOUT_DET
;  -D ABL_MILLIAMPS_DEFAULT=0
  -UWLED_USE_MY_CONFIG

These are my daily build environments for ESP32 that I often change wile testing WLED. I have similar environments for 8266, S2, S3 and C3, Ethernet, WROVER models and many exotic boards that I received for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants