Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please include llvm-link, llc and opt? #187

Open
rgrr opened this issue Feb 24, 2023 · 12 comments
Open

Could you please include llvm-link, llc and opt? #187

rgrr opened this issue Feb 24, 2023 · 12 comments

Comments

@rgrr
Copy link

rgrr commented Feb 24, 2023

Could you please include the above files via "clang-tools-extra"? Our build system relies on those to do link time optimization.

Or are those steps completely outdated?

@smithp35
Copy link
Contributor

Those tools are part of the llvm build, not clang-tools-extra which includes tools like clangd. We currently filter out these tools from the build as they are usually regarded as developer tools not designed for end-users. In the short term if you want to add these to the build then you can edit the top-level CMakeLists.txt file to add them.

set(LLVM_DISTRIBUTION_COMPONENTS
    clang-format
    clang-resource-headers
    clang
    dsymutil
    lld
    llvm-ar
    llvm-config
    llvm-cov
    llvm-cxxfilt
    llvm-dwarfdump
    llvm-nm
    llvm-objcopy
    llvm-objdump
    llvm-profdata
    llvm-ranlib
    llvm-readelf
    llvm-readobj
    llvm-size
    llvm-strip
    llvm-symbolizer
    LTO
    CACHE STRING ""
)

In theory you shouldn't need to use those tools for LTO as bitcode files can be presented directly to the linker LLD. This will call the LLVM code-generator on the bitcode files automatically. Essentially acting as llvm-link, opt and LLC in one go. I can see that rolling your own pipeline with llvm-link, opt and LLC may give you more flexibility, for example if you need to have several disjoint sets of bit-code files that are optimised together into multiple ELF files, whereas LLD will deal with all the bit-code files in one go.

@rgrr
Copy link
Author

rgrr commented Feb 25, 2023

Thanks for the hint. I will check my build chain if it outputs similar image sizes without the additional llc/llvm-link/opt steps.

Nevertheless for completeness could you please add the three tools?

@rgrr
Copy link
Author

rgrr commented Feb 26, 2023

Ok, i have played around with link time optimization and found out that it does not equal my previous buildchain which includes llvm-link, opt and llc. It actually breaks it.

Differences are:

  • llvm-link/llvm-llc reliably throw away unused symbol references, while standalone lld does not (it insists on referenced symbols from unused functions)
  • output image is much larger, llvm-link/opt/llc create an image size for my application of 425472 bytes while lld alone outputs 639056 bytes. Significant difference!

I'm passing the following hopefully meaningful arguments to standalone lld:

-Wl,--icf=all -Wl,-O99 -Wl,--lto-newpm-passes=999 -Wl,--lto-O3 -Wl,--lto-partitions=999 -Wl,--lto-whole-program-visibility -Wl,--nostdlib -Wl,--orphan-handling=place -Wl,--warn-unresolved-symbols

So I think llvm-link/llc/opt are not just playground for compiler developers.

@smithp35
Copy link
Contributor

At the moment the best I can do is say we'll take that under consideration. As mentioned before these tools can be put back by editing the CMakeLists.txt if you want to do that.

The prime reason for not just adding everything from LLVM is not that the developer-facing tools are not useful, it is that their user-interface is considerably less stable than the user-facing tools. While these don't get changed often, using them carries a much greater risk of breaking changes being made by the LLVM developers. I think if we did want to do this we may want to consider putting them in a separate directory from bin to emphasise that.

It may be worth investigating why lld is giving worse results. Some thoughts:

  • The optimization levels for LTO via LLVM are a bit confusing, there is the LTO optimization level and the Code Generator optimization level. For example clang -emit-llvm file.c -o file.bc will embed optnone which will prevent LTO from optimizing. I'd make sure you are using clang -flto for both the compile and link lines. FWIW there is a --lto-O= that defaults to 2, but can go as far as 3 to make it more aggressive. Recently (too recently for the binary relases) a --lto-CGO= has been added to override the optimization level used when compiling. These are lld options so you'll need to pass them via -Wl,--lto-O=3 if using clang linker driver (https://reviews.llvm.org/D141970).
  • If your program is C++ then the lld option --lto-whole-program-visibility can enable more aggressive devirtualization.
  • Unrelated to LTO LLD itself has some size optimizations -O2 --icf=safe --gc-sections.

@rgrr
Copy link
Author

rgrr commented Feb 28, 2023

Thank you Peter for the hint. Adding -flto did the trick: lld does now LTO (and the options from above became obsolete).

But... there's always a but... LTO of lld seems to be very aggressive. Especially extra sections are endangered to be dropped or the optimized code even becomes invalid. Examples:

In my firmware image there is a git hash at a fixed address. The git hash and the version are generated in the Makefile like this:

	$(ECHO) -e '__attribute__ ((section (".gitHash"))) char gitHash[] = $(GIT_HASH);' '__attribute__ ((section (".ehVersion")))   volatile unsigned long ehVersion   = $(COMMON_EH_VERSION);' | $(CLANG) -c -x c $(C_AND_CXX_FLAGS) -o $(OBJECT_DIRECTORY)/gitHashEhVersion.o -

The volatile is required because otherwise LTO directly replaces the ehVersion value as an immediate value in the code and drops the section. Not as intended, because the address map becomes mixed up.
Also both variables have to be referenced, otherwise the section is dropped. Section definition:

        KEEP(*(.ehVersion))
        KEEP(*(.gitHash))

BTW even --no-gc-sections does not help to prevent this.

The other even worse thing is, that LTO converts a while-loop into a do-while-loop, because it "thinks" that the loop does at least one run, which unfortunately is wrong. The thing has again to do with special sections. In this case handler addresses (the number of (user) handlers is not known to the SDK) are put into a special segment and if some event happens, then a function with a while-loop iterates over the special section and calls the handlers. If there is no handler defined, the bad thing happens.

Code fragment:

static void sdh_state_observer_notify(nrf_sdh_state_evt_t evt)
{
    nrf_section_iter_t iter;

    NRF_LOG_DEBUG("State change: 0x%08X", evt);

    for (nrf_section_iter_init(&iter, &sdh_state_observers);
         nrf_section_iter_get(&iter) != NULL;
         nrf_section_iter_next(&iter))
    {
        nrf_sdh_state_observer_t    * p_observer;
        nrf_sdh_state_evt_handler_t   handler;

        p_observer = (nrf_sdh_state_observer_t *) nrf_section_iter_get(&iter);
        handler    = p_observer->handler;

        handler(evt, p_observer->p_context);
    }
}

sdh_state_observers is in a special section:

  .sdh_state_observers :
  {
    PROVIDE(__start_sdh_state_observers = .);
    KEEP(*(SORT(.sdh_state_observers*)))
    PROVIDE(__stop_sdh_state_observers = .);
  } > FLASH

The users application puts handler references into the special section via some macros.

__start_sdh_state_observers and __stop_sdh_state_observers are TMO not taken into account by LTO and for unknown reason LTO assumes __start_sdh_state_observers != __stop_sdh_state_observers.

opt BTW does only the second bad thing but not the first one. Nevertheless I'm currently using neither LTO nor opt because of this. But llvm-link and llc are required for the build chain.

I've played a lot with lld/opt options but to no avail. Any hint is really appreciated.

BTW: above code is from the Nordic SDK, the bad behavior did not exist with clang 13.

@rgrr
Copy link
Author

rgrr commented Feb 28, 2023

Checked something more:

  • bad case (1) happens with all -Wl,--lto-Ox
  • bad case (2) appears with -Wl,--lto-Ox, x>=2

@smithp35
Copy link
Contributor

smithp35 commented Mar 1, 2023

Not sure I can help too much. There are known limitations for LTO with embedded systems, particularly with linker scripts as some of the assumptions LTO makes don't always hold. In particular preservation of named sections, not optimizing across output section boundaries. There have been some downstream attempts to remedy this, but no-one yet has decided to upstream the changes. If you are interested in some references I'll put some at the end.

LTO code-generation will assume that it has knowledge of the whole program. The static linker (lld) will tell the code-generator to preserve certain symbols https://github.com/llvm/llvm-project/blob/main/lld/ELF/LTO.cpp#L259 , this usually means references from outside the bitcode files, but can include _start and _stop symbols too. This is independent of gc-sections, which will run after LTO code-generation. For things like the .githash section that are not referenced by any part of the program I recommend not compiling with -flto or -emit-llvm. Instead just compile them to a standard ELF file. It is permitted to mix ELF and bitcode on the same link line. If you must pass code like .githash into LTO you will need to prove to the LTO code generator that it is used, I think you have worked out how to do that.

I don't have enough context to help with the for loop transformation. The code fragment has a for loop and not a while loop and doesn't refer to __start_sdh_state_observers and __stop_sdh_state_observers. It looks like LLD will tell LTO to preserve _start and _stop symbols but I don't think that would cause the code generator to infer that __start and __stop were not equal.

My suggestion to proceed is to try and make a stripped down example, i.e. can you reproduce with just a small bit of code and linker script and then raise an issue in the llvm-project https://github.com/llvm/llvm-project . LLD does have a --reproduce= which will produce a tar archive containing all the dependencies and command line options to reproduce the link.

GSOC code to improve symbol dependency:
https://discourse.llvm.org/t/richer-symbol-dependency-information-for-lto/60335

LLVM dev meeting talks:

@rgrr
Copy link
Author

rgrr commented Mar 1, 2023

Hello Peter,
many thanks for this valuable information. Seems to be some hard stuff.
I will do my best and create a stripped down version of the problem and post it to the llvm-project.

Just to mention: a for-loop is actually a while-loop, sorry for being misunderstandable. And my code above is clearly incomplete. __start/__stop are referenced via some macros in the iterator.

@rgrr
Copy link
Author

rgrr commented Mar 3, 2023

I have created a simple test program (https://github.com/rgrr/playground/tree/feature/llvm-games/llvm-games/broken-lto).

It seems that this is not LTO related: it is a bug/problem in the optimizer. In my project it rears up only while doing global optimizations, i.e. either lld or opt. In the test program it already shows with simple compiler optimizations enabled.

I will open an issue at the llvm-project. But I'm not really confident that it will be ever read (nor fixed).

@mplatings
Copy link
Contributor

I believe no further action is required for this project so closing.

@mplatings mplatings closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2023
@rgrr
Copy link
Author

rgrr commented Mar 17, 2023

Hmmm... not happy with that. llvm-link, llc and opt are still not included.

Sorry that the discussion got a little bit diverted.

@mplatings
Copy link
Contributor

Sorry, I missed the bit where Peter said we might consider adding the tools under a different directory. Thanks for raising llvm/llvm-project#61173 - I think fixing that is the right solution. I personally don't think adding internal tools would be appropriate, but I'll reopen the issue since that's still up for discussion.

@mplatings mplatings reopened this Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants