Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The compile times are super slow #125

Open
Morwenn opened this issue Feb 6, 2018 · 2 comments
Open

The compile times are super slow #125

Morwenn opened this issue Feb 6, 2018 · 2 comments

Comments

@Morwenn
Copy link
Owner

Morwenn commented Feb 6, 2018

This isn't new, but the huge template mess everywhere in the library means that the compile times are slow, and they are currently even more terrible in the mutable sorters branch (which doesn't matter for now because it is blocked by internal compiler errors in clang++, but still...). However reducing those compile times without breaking the current API and reducing the power of the wrappers seems tricky. To be honest, I don't even know where to start to reduce the evergrowing template instantiations that occur everywhere. I tried a few things knowing that it wouldn't change much:

  • Replace most std::decay_t by remove_cvref_t
  • Invert the choice mechanism in hybrid_adapter to reduce its recursive instantiations from 127 to <30 most of the time
  • Move a few elements out of templates when they don't depend on every template parameter
  • Don't use sorter_facade to implement sorter adapters when they can already handle all the operator() overloads by themselves

It decreased the debug executable size a bit, but that's pretty much the only noticeable achievement.

But in the grand scheme of things it doesn't change much. I am looking for ideas to reduce a bit the compile times without breaking the library API, but I've only got a few tiny ideas:

  • Maybe I could move even more things out of some templates. (I'm out of ideas for where that would be relevant)
  • Turning trait-like class templates into templated using when possible generally ends up in the compiler instantiating fewer templates, which may help reduce both binary size and compile times. (I did that in a few places, now I don't know where it might still be relevant)
  • We could introduce more specialized sorter_facade derivatives for specific sorter implementations since the ones in the library mostly have the same functions, but it doesn't look easy to make it work generically for user-defined sorters without introducing more tags.
  • std::conjunction and friends are apparently slow and probably unneeded in most places; I should analyze the library to reduce their uses if possible.
  • Some libraries apparently improved compile time by replacing calls to std::move and std::forward by a bunch of static_cast, even though it doesn't look like the cleanest thing to do. (I tried that for std::forward and didn't notice a difference)
  • On some platforms, SFINAE conditions are 10x faster to compile when they are in the return type (as opposed to the template parameter list).
  • Qualifying more calls in specific places may avoid to gratuitously trigger ADL so that the compiler won't needlessly look up for a function in the argument's associated namespace.
  • Replacing calls to std::distance by subtractions when functions only handle random-access iterators may decrease the amount of tag dispatch happening (same for other iterators operations). It might also make it obvious when code is designed to work with random-access iterators only.
  • Some compilers provide intrinsics like _EnableIf that don't trigger template instantiations, which might be worth considering if they improve the state of things. On the other hand we should check whether it makes error message worse or not, because Error messages are unreadable #28 is a thing too.

Those are only small changes, and I doubt that it will make a difference. If anyone has better ideas to reduce the compile times without breaking the API, I welcome such ideas :p


List of articles about speeding up compilation of templates and related stuff (suggest articles if needed):

@Morwenn
Copy link
Owner Author

Morwenn commented Jul 15, 2020

I compiled the test suite with Clang 9's new -time-trace option and visualized the results with speedscope. Here are the main takeaways so far:

  • Most of the time is spent in front-end: sometimes in the parsing phase, sometimes in the template instantiation phase.
  • Despite already optimizing it in the past, <catch2/catch.hpp> is still a heavy hitter during the parsing phase. Fortunately, Catch2 v3.0 should be released in the near future and reduce the compile times.
  • When used, cppsort::sort and cppsort::default_sorter are expensive. The usefulness of cppsort::sort is debatable and I might want to nuke it for cpp-sort 2.0; even cppsort::stable_sort is probably not that good an idea since people should learn about stable_adapter instead, which by the way is easier to use in C++17 thanks to CTAD. cppsort::default_sorter is arguably a bad idea too and I could see it disappearing in cpp-sort 2.0: people come to this library to test miscellaneous sorting algorithms, when all they want is a fast default, std::sort already does a fine enough job. I will likely open issues for those.
  • std::result_of and everything using it takes a lot of instantiation time, which makes sorter_traits.h an expensive header to use - yet it's everywhere in the library. Or maybe it seems to be the case because result_of is often the first thing to instantiate the sorter templates.
  • The previous point on the other hand also revealed that std::conjunction & friends skipping template instantiations might actually help.

Those are some preliminary results, more to come later.

Morwenn added a commit that referenced this issue Jul 17, 2020
This mainly done as part of issue #167, but also improves the situation
a bit for issues #28 and #125.
@Morwenn
Copy link
Owner Author

Morwenn commented Jul 25, 2020

The branch catch2-v3 contains an experimental switch to Catch2 3.0.0 using the top of the development branch. So far every file in the test suite is faster to parse thanks to the light granular includes, but I don't know whether a full build is faster because it has to download & compile Catch2 before compiling cpp-sort's test suite. However I do expected incremental builds to greatly benefit from the change.

I plan to merge the branch once Catch2 3.0.1 is officially released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant