-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace sort implementations #124032
base: master
Are you sure you want to change the base?
Replace sort implementations #124032
Conversation
r? thomcc |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
@bors try (looks like the try build didn't get through) |
Replace sort implementations This PR replaces the sort implementations with tailor-made ones that strike a balance of run-time, compile-time and binary-size, yielding run-time and compile-time improvements. Regressing binary-size for `slice::sort` while improving it for `slice::sort_unstable`. All while upholding the existing soft and hard safety guarantees, and even extending the soft guarantees, detecting strict weak ordering violations with a high chance and reporting it to users via a panic. * `slice::sort` -> driftsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/driftsort_introduction/text.md) * `slice::sort_unstable` -> ipnsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/ipnsort_introduction/text.md) #### Why should we change the sort implementations? In the [2023 Rust survey](https://blog.rust-lang.org/2024/02/19/2023-Rust-Annual-Survey-2023-results.html#challenges), one of the questions was: "In your opinion, how should work on the following aspects of Rust be prioritized?". The second place was "Runtime performance" and the third one "Compile Times". This PR aims to improve both. #### Why is this one big PR and not multiple? * The current documentation gives performance recommendations for `slice::sort` and `slice::sort_unstable`. If for example only one of them were to changed, this advise may be misleading for some Rust versions. By replacing them atomically, the advise remains largely unchanged, and users don't have to change their code. * driftsort and ipnsort share a substantial part of their implementations. * The implementation of `select_nth_unstable` uses internals of `slice::sort_unstable`, which makes it impractical to split changes. --- This PR is a collaboration with `@orlp.`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (a4132fa): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 679.119s -> 683.166s (0.60%) |
The binary-size results are inline with what we expected, given that Regarding compile-times, we looked at those in our analysis here and here. The instruction count metric does not account for gains via parallelizability, right? |
One thing that I find particularly strange is a 4.31% regression in |
We discussed this in the library team meeting today: we're happy to add this despite the compilation time regressions, however it would be nice to investigate the regression in html5ever in more detail to see if there is a low-hanging fruit to avoid the regression. cc @rust-lang/wg-compiler-performance |
Let's do a fresh run to get more updated results. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Replace sort implementations This PR replaces the sort implementations with tailor-made ones that strike a balance of run-time, compile-time and binary-size, yielding run-time and compile-time improvements. Regressing binary-size for `slice::sort` while improving it for `slice::sort_unstable`. All while upholding the existing soft and hard safety guarantees, and even extending the soft guarantees, detecting strict weak ordering violations with a high chance and reporting it to users via a panic. * `slice::sort` -> driftsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/driftsort_introduction/text.md) * `slice::sort_unstable` -> ipnsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/ipnsort_introduction/text.md) #### Why should we change the sort implementations? In the [2023 Rust survey](https://blog.rust-lang.org/2024/02/19/2023-Rust-Annual-Survey-2023-results.html#challenges), one of the questions was: "In your opinion, how should work on the following aspects of Rust be prioritized?". The second place was "Runtime performance" and the third one "Compile Times". This PR aims to improve both. #### Why is this one big PR and not multiple? * The current documentation gives performance recommendations for `slice::sort` and `slice::sort_unstable`. If for example only one of them were to changed, this advise may be misleading for some Rust versions. By replacing them atomically, the advise remains largely unchanged, and users don't have to change their code. * driftsort and ipnsort share a substantial part of their implementations. * The implementation of `select_nth_unstable` uses internals of `slice::sort_unstable`, which makes it impractical to split changes. --- This PR is a collaboration with `@orlp.`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (5b76fc7): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 0.9%, secondary 4.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 3.4%, secondary 3.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 1.4%, secondary 2.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 678.357s -> 686.997s (1.27%) |
Worth noting that there are binary size regressions, too :( |
- `slice::sort` -> driftsort https://github.com/Voultapher/sort-research-rs/blob/main/writeup/driftsort_introduction/text.md - `slice::sort_unstable` -> ipnsort https://github.com/Voultapher/sort-research-rs/blob/main/writeup/ipnsort_introduction/text.md Replaces the sort implementations with tailor made ones that strike a balance of run-time, compile-time and binary-size, yielding run-time and compile-time improvements. Regressing binary-size for `slice::sort` while improving it for `slice::sort_unstable`. All while upholding the existing soft and hard safety guarantees, and even extending the soft guarantees, detecting strict weak ordering violations with a high chance and reporting it to users via a panic. In addition the implementation of `select_nth_unstable` is also adapted as it uses `slice::sort_unstable` internals.
Also includes small doc fixes.
@nnethercote, indeed that's an unfortunate effect of this change. As discussed in the design documents, |
CC @diondokter - this might be an interesting use-case for "optimize-for-size" stdlib feature. |
Users that care about binary-size and or compile-times over everything else, are probably better served with https://github.com/Voultapher/tiny-sort-rs. |
@Amanieu I've investigated the html5ever regression. And here are my findings: After rebasing this branch on master I ran Results using
Locally built stage2 branch
Locally built stage2 branch
Looking at the rustc perf suite results, we see the regression can be attributed entirely to the backend code generation. Which would match our own testing that significantly more LLVM-IR is generated, because the |
Does |
I want to make sure I understand the current situation.
AFAICT currently:
Am I overlooking anything?
I appreciate the effort taken to make this implementation smaller, but the comparison with glidesort doesn't seem relevant; the relevant comparison is the current sort implementation. Overall, I don't think these results are a worthwhile trade-off at the moment :( |
@nnethercote I think you are underselling the magnitude of "run-time: microbenchmarks show speed ups in a lot of cases, but also some slowdowns". A lot of the benchmarks in that list are just table-stakes tests against regressions (you can only 'sort' an already ascending input so fast - obviously you won't see a significant improvement in that). It's not just about how many tests you improve/regress on, but also by how much. Please allow me to highlight the important (in my opinion) tests from that microbenchmark list:
Also note that we have tons more benchmarks in the writeups (1, 2), with relevant real-world distributions. For example, driftsort has a massive speedup compared to the current We are talking speedups that can be an order of magnitude or more... Are we really saying that it's okay the default |
Thanks for the additional data, that is helpful. I didn't realize that the "design document" links in the PR description included measurements of this sort, so I latched on to the only runtime measurements I could see in this PR. Based on that, I'm happy to defer to the libs team's judgment. |
I'd appreciate it if participants in this conversation would read the design documents. They include comprehensive analysis and reasoning why this change makes sense in the author's eyes. Among other things they include 6000 benchmark results across 3 different machines, including x86 and Arm. The test and benchmark suite is the culmination of 2 years of research and work. |
This PR replaces the sort implementations with tailor-made ones that strike a balance of run-time, compile-time and binary-size, yielding run-time and compile-time improvements. Regressing binary-size for
slice::sort
while improving it forslice::sort_unstable
. All while upholding the existing soft and hard safety guarantees, and even extending the soft guarantees, detecting strict weak ordering violations with a high chance and reporting it to users via a panic.slice::sort
-> driftsort design document, includes detailed benchmarks and analysis.slice::sort_unstable
-> ipnsort design document, includes detailed benchmarks and analysis.Why should we change the sort implementations?
In the 2023 Rust survey, one of the questions was: "In your opinion, how should work on the following aspects of Rust be prioritized?". The second place was "Runtime performance" and the third one "Compile Times". This PR aims to improve both.
Why is this one big PR and not multiple?
slice::sort
andslice::sort_unstable
. If for example only one of them were to be changed, this advise would be misleading for some Rust versions. By replacing them atomically, the advise remains largely unchanged, and users don't have to change their code.select_nth_unstable
uses internals ofslice::sort_unstable
, which makes it impractical to split changes.This PR is a collaboration with @orlp.