Could you please include llvm-link, llc and opt? #187

rgrr · 2023-02-24T16:25:45Z

Could you please include the above files via "clang-tools-extra"? Our build system relies on those to do link time optimization.

Or are those steps completely outdated?

smithp35 · 2023-02-24T18:32:06Z

Those tools are part of the llvm build, not clang-tools-extra which includes tools like clangd. We currently filter out these tools from the build as they are usually regarded as developer tools not designed for end-users. In the short term if you want to add these to the build then you can edit the top-level CMakeLists.txt file to add them.

set(LLVM_DISTRIBUTION_COMPONENTS
    clang-format
    clang-resource-headers
    clang
    dsymutil
    lld
    llvm-ar
    llvm-config
    llvm-cov
    llvm-cxxfilt
    llvm-dwarfdump
    llvm-nm
    llvm-objcopy
    llvm-objdump
    llvm-profdata
    llvm-ranlib
    llvm-readelf
    llvm-readobj
    llvm-size
    llvm-strip
    llvm-symbolizer
    LTO
    CACHE STRING ""
)

In theory you shouldn't need to use those tools for LTO as bitcode files can be presented directly to the linker LLD. This will call the LLVM code-generator on the bitcode files automatically. Essentially acting as llvm-link, opt and LLC in one go. I can see that rolling your own pipeline with llvm-link, opt and LLC may give you more flexibility, for example if you need to have several disjoint sets of bit-code files that are optimised together into multiple ELF files, whereas LLD will deal with all the bit-code files in one go.

rgrr · 2023-02-25T11:52:53Z

Thanks for the hint. I will check my build chain if it outputs similar image sizes without the additional llc/llvm-link/opt steps.

Nevertheless for completeness could you please add the three tools?

rgrr · 2023-02-26T09:11:52Z

Ok, i have played around with link time optimization and found out that it does not equal my previous buildchain which includes llvm-link, opt and llc. It actually breaks it.

Differences are:

llvm-link/llvm-llc reliably throw away unused symbol references, while standalone lld does not (it insists on referenced symbols from unused functions)
output image is much larger, llvm-link/opt/llc create an image size for my application of 425472 bytes while lld alone outputs 639056 bytes. Significant difference!

I'm passing the following hopefully meaningful arguments to standalone lld:

-Wl,--icf=all -Wl,-O99 -Wl,--lto-newpm-passes=999 -Wl,--lto-O3 -Wl,--lto-partitions=999 -Wl,--lto-whole-program-visibility -Wl,--nostdlib -Wl,--orphan-handling=place -Wl,--warn-unresolved-symbols

So I think llvm-link/llc/opt are not just playground for compiler developers.

smithp35 · 2023-02-27T10:53:53Z

At the moment the best I can do is say we'll take that under consideration. As mentioned before these tools can be put back by editing the CMakeLists.txt if you want to do that.

The prime reason for not just adding everything from LLVM is not that the developer-facing tools are not useful, it is that their user-interface is considerably less stable than the user-facing tools. While these don't get changed often, using them carries a much greater risk of breaking changes being made by the LLVM developers. I think if we did want to do this we may want to consider putting them in a separate directory from bin to emphasise that.

It may be worth investigating why lld is giving worse results. Some thoughts:

The optimization levels for LTO via LLVM are a bit confusing, there is the LTO optimization level and the Code Generator optimization level. For example clang -emit-llvm file.c -o file.bc will embed optnone which will prevent LTO from optimizing. I'd make sure you are using clang -flto for both the compile and link lines. FWIW there is a --lto-O= that defaults to 2, but can go as far as 3 to make it more aggressive. Recently (too recently for the binary relases) a --lto-CGO= has been added to override the optimization level used when compiling. These are lld options so you'll need to pass them via -Wl,--lto-O=3 if using clang linker driver (https://reviews.llvm.org/D141970).
If your program is C++ then the lld option --lto-whole-program-visibility can enable more aggressive devirtualization.
Unrelated to LTO LLD itself has some size optimizations -O2 --icf=safe --gc-sections.

rgrr · 2023-02-28T21:14:42Z

Thank you Peter for the hint. Adding -flto did the trick: lld does now LTO (and the options from above became obsolete).

But... there's always a but... LTO of lld seems to be very aggressive. Especially extra sections are endangered to be dropped or the optimized code even becomes invalid. Examples:

In my firmware image there is a git hash at a fixed address. The git hash and the version are generated in the Makefile like this:

	$(ECHO) -e '__attribute__ ((section (".gitHash"))) char gitHash[] = $(GIT_HASH);' '__attribute__ ((section (".ehVersion")))   volatile unsigned long ehVersion   = $(COMMON_EH_VERSION);' | $(CLANG) -c -x c $(C_AND_CXX_FLAGS) -o $(OBJECT_DIRECTORY)/gitHashEhVersion.o -

The volatile is required because otherwise LTO directly replaces the ehVersion value as an immediate value in the code and drops the section. Not as intended, because the address map becomes mixed up.
Also both variables have to be referenced, otherwise the section is dropped. Section definition:

        KEEP(*(.ehVersion))
        KEEP(*(.gitHash))

BTW even --no-gc-sections does not help to prevent this.

The other even worse thing is, that LTO converts a while-loop into a do-while-loop, because it "thinks" that the loop does at least one run, which unfortunately is wrong. The thing has again to do with special sections. In this case handler addresses (the number of (user) handlers is not known to the SDK) are put into a special segment and if some event happens, then a function with a while-loop iterates over the special section and calls the handlers. If there is no handler defined, the bad thing happens.

Code fragment:

static void sdh_state_observer_notify(nrf_sdh_state_evt_t evt)
{
    nrf_section_iter_t iter;

    NRF_LOG_DEBUG("State change: 0x%08X", evt);

    for (nrf_section_iter_init(&iter, &sdh_state_observers);
         nrf_section_iter_get(&iter) != NULL;
         nrf_section_iter_next(&iter))
    {
        nrf_sdh_state_observer_t    * p_observer;
        nrf_sdh_state_evt_handler_t   handler;

        p_observer = (nrf_sdh_state_observer_t *) nrf_section_iter_get(&iter);
        handler    = p_observer->handler;

        handler(evt, p_observer->p_context);
    }
}

sdh_state_observers is in a special section:

  .sdh_state_observers :
  {
    PROVIDE(__start_sdh_state_observers = .);
    KEEP(*(SORT(.sdh_state_observers*)))
    PROVIDE(__stop_sdh_state_observers = .);
  } > FLASH

The users application puts handler references into the special section via some macros.

__start_sdh_state_observers and __stop_sdh_state_observers are TMO not taken into account by LTO and for unknown reason LTO assumes __start_sdh_state_observers != __stop_sdh_state_observers.

opt BTW does only the second bad thing but not the first one. Nevertheless I'm currently using neither LTO nor opt because of this. But llvm-link and llc are required for the build chain.

I've played a lot with lld/opt options but to no avail. Any hint is really appreciated.

BTW: above code is from the Nordic SDK, the bad behavior did not exist with clang 13.

rgrr · 2023-02-28T21:38:23Z

Checked something more:

bad case (1) happens with all -Wl,--lto-Ox
bad case (2) appears with -Wl,--lto-Ox, x>=2

smithp35 · 2023-03-01T10:02:52Z

Not sure I can help too much. There are known limitations for LTO with embedded systems, particularly with linker scripts as some of the assumptions LTO makes don't always hold. In particular preservation of named sections, not optimizing across output section boundaries. There have been some downstream attempts to remedy this, but no-one yet has decided to upstream the changes. If you are interested in some references I'll put some at the end.

LTO code-generation will assume that it has knowledge of the whole program. The static linker (lld) will tell the code-generator to preserve certain symbols https://github.com/llvm/llvm-project/blob/main/lld/ELF/LTO.cpp#L259 , this usually means references from outside the bitcode files, but can include _start and _stop symbols too. This is independent of gc-sections, which will run after LTO code-generation. For things like the .githash section that are not referenced by any part of the program I recommend not compiling with -flto or -emit-llvm. Instead just compile them to a standard ELF file. It is permitted to mix ELF and bitcode on the same link line. If you must pass code like .githash into LTO you will need to prove to the LTO code generator that it is used, I think you have worked out how to do that.

I don't have enough context to help with the for loop transformation. The code fragment has a for loop and not a while loop and doesn't refer to __start_sdh_state_observers and __stop_sdh_state_observers. It looks like LLD will tell LTO to preserve _start and _stop symbols but I don't think that would cause the code generator to infer that __start and __stop were not equal.

My suggestion to proceed is to try and make a stripped down example, i.e. can you reproduce with just a small bit of code and linker script and then raise an issue in the llvm-project https://github.com/llvm/llvm-project . LLD does have a --reproduce= which will produce a tar archive containing all the dependencies and command line options to reproduce the link.

GSOC code to improve symbol dependency:
https://discourse.llvm.org/t/richer-symbol-dependency-information-for-lto/60335

LLVM dev meeting talks:

rgrr · 2023-03-01T20:25:59Z

Hello Peter,
many thanks for this valuable information. Seems to be some hard stuff.
I will do my best and create a stripped down version of the problem and post it to the llvm-project.

Just to mention: a for-loop is actually a while-loop, sorry for being misunderstandable. And my code above is clearly incomplete. __start/__stop are referenced via some macros in the iterator.

rgrr · 2023-03-03T20:43:11Z

I have created a simple test program (https://github.com/rgrr/playground/tree/feature/llvm-games/llvm-games/broken-lto).

It seems that this is not LTO related: it is a bug/problem in the optimizer. In my project it rears up only while doing global optimizations, i.e. either lld or opt. In the test program it already shows with simple compiler optimizations enabled.

I will open an issue at the llvm-project. But I'm not really confident that it will be ever read (nor fixed).

mplatings · 2023-03-17T08:07:01Z

I believe no further action is required for this project so closing.

rgrr · 2023-03-17T08:09:50Z

Hmmm... not happy with that. llvm-link, llc and opt are still not included.

Sorry that the discussion got a little bit diverted.

mplatings · 2023-03-17T08:55:35Z

Sorry, I missed the bit where Peter said we might consider adding the tools under a different directory. Thanks for raising llvm/llvm-project#61173 - I think fixing that is the right solution. I personally don't think adding internal tools would be appropriate, but I'll reopen the issue since that's still up for discussion.

mplatings closed this as completed Mar 17, 2023

mplatings closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2023

mplatings reopened this Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you please include llvm-link, llc and opt? #187

Could you please include llvm-link, llc and opt? #187

rgrr commented Feb 24, 2023

smithp35 commented Feb 24, 2023

rgrr commented Feb 25, 2023

rgrr commented Feb 26, 2023

smithp35 commented Feb 27, 2023

rgrr commented Feb 28, 2023

rgrr commented Feb 28, 2023

smithp35 commented Mar 1, 2023

rgrr commented Mar 1, 2023

rgrr commented Mar 3, 2023 •

edited

mplatings commented Mar 17, 2023

rgrr commented Mar 17, 2023

mplatings commented Mar 17, 2023

Could you please include llvm-link, llc and opt? #187

Could you please include llvm-link, llc and opt? #187

Comments

rgrr commented Feb 24, 2023

smithp35 commented Feb 24, 2023

rgrr commented Feb 25, 2023

rgrr commented Feb 26, 2023

smithp35 commented Feb 27, 2023

rgrr commented Feb 28, 2023

rgrr commented Feb 28, 2023

smithp35 commented Mar 1, 2023

rgrr commented Mar 1, 2023

rgrr commented Mar 3, 2023 • edited

mplatings commented Mar 17, 2023

rgrr commented Mar 17, 2023

mplatings commented Mar 17, 2023

rgrr commented Mar 3, 2023 •

edited