Small perf regression in #69635 #69710

ecstatic-morse · 2020-03-04T19:11:27Z

#69635 (a rollup) caused a small performance regression. Since the regression has persisted through the last half dozen perf runs, it is likely not spurious. There are no outstanding perf runs near the regression (2020-03-02), so I am fairly confident in this attribution. That rollup consists of the following PRs:

Add documentation to compiler intrinsics #68682 (Add documentation to compiler intrinsics)
Unrevert "Remove checked_add in Layout::repeat" #69544 (Unrevert "Remove checked_add in Layout::repeat")
constify mem::forget #69617 (constify mem::forget)
Rename syntax in librustc_ast/README.md #69622 (Rename syntax in librustc_ast/README.md)
stash API: remove panic to fix ICE. #69623 (stash API: remove panic to fix ICE.)
Toolstate: Don't block beta week on already broken tools. #69624 (Toolstate: Don't block beta week on already broken tools.)

I would not expect any of them to negatively impact performance, although I did do a perf run with an unchecked_add in Layout::repeat to no avail (see #69679). The regression is only noticeable in clean/patched incremental builds, so I don't think this is high priority, but it would be nice to know what's going on here.

cc @nnethercote @jonas-schievink @Mark-Simulacrum

The text was updated successfully, but these errors were encountered:

ecstatic-morse · 2020-03-04T22:56:10Z

The perf results for #69241, the original revert of #67174, are basically the mirror of the regression mentioned above. #67220, the first rollup to include #67174, also caused a performance regression, but it was attributed to a different PR in that rollup.

Therefore, I'm confident that #67174/#69544 is the root cause of this regression.

#69679 suggests that the optimizer does not benefit from being able to assume that the addition at the start of Layout::repeat does not wrap. Assuming that the invariant cited in #67174 does in fact hold, my only hypothesis is that removing the ? operator causes Layout::repeat to become eligible for inlining, which in turn makes one of its callees large enough to exceed the inlining threshold.

In any case, this seems like it should interest @nnethercote, since it indicates that the code in Layout is hot enough to benefit from micro-optimization. TBH, I'd also like to learn what approach they would use to investigate this kind of issue.

nnethercote · 2020-03-05T05:56:35Z

Here is some output from a Callgrind profile that I had lying around. This is for a Check-CleanIncr run of ucd, which saw a 4.6% regression:

        .               pub fn repeat(&self, n: usize) -> Result<(Self, usize), LayoutErr> {
        .                   // Warning, removing the checked_add here led to segfaults in #67174. Further
        .                   // analysis in #69225 seems to indicate that this is an LTO-related
        .                   // miscompilation, so #67174 might be able to be reapplied in the future.
        .                   let padded_size = self
        .                       .size()
        .                       .checked_add(self.padding_needed_for(self.align()))
        .                       .ok_or(LayoutErr { private: () })?;
1,313,433 ( 0.01%)          let alloc_size = padded_size.checked_mul(n).ok_or(LayoutErr { private: () })?;
        .
        .                   unsafe {
        .                       // self.align is already known to be valid and alloc_size has been
        .                       // padded already.
        .                       Ok((Layout::from_size_align_unchecked(alloc_size, self.align()), padded_size))
        .                   }
        .               }

It suggests two things:

This function is not at all hot, so it's surprising that changing it would have a large perf effect.
It (the old version) is getting inlined. (If it wasn't inlined, there would be non-zero instruction counts on the fn line and the closing brace line.)

I can do a proper analysis tomorrow if that would be helpful. That will likely involve running Cachegrind on ucd-Check-CleanIncr with both versions of the compiler, and then doing a diff with cg_diff, and seeing what that finds.

But it might just be simpler to revert the change again? First it caused miscompilation, then it caused a perf regression... maybe this code should just be left alone :)

ecstatic-morse · 2020-03-05T18:57:02Z

I'll open a PR that reverts #69544. I'd like to spend some time investigating it as well, since I have no idea why this would affect performance and it seems like a good opportunity to learn. If you feel the urge to look into this, however, don't let me stop you! I'll be pretty slow.

ecstatic-morse · 2020-03-06T23:36:25Z

@rust-timer build 4bfc61c90b9066ee396739e074d6d05abcfda04e

rust-timer · 2020-03-06T23:36:27Z

Queued 4bfc61c90b9066ee396739e074d6d05abcfda04e with parent c79f5f0, future comparison URL.

ecstatic-morse · 2020-03-07T00:19:40Z

For anyone who's curious, here are the results of cg_diff master rerevert-67174 on a clean ucd-check run. Negative numbers means the count decreased after reverting #67174.

--------------------------------------------------------------------------------
Ir          I1mr   ILmr   Dr          D1mr      DLmr    Dw          D1mw    DLmw    
--------------------------------------------------------------------------------
-16,399,587 48,614 -1,909 -15,203,345 1,677,474 -18,799 -14,933,944 939,273 145,091  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir          I1mr   ILmr  Dr          D1mr      DLmr    Dw          D1mw    DLmw     file:function
--------------------------------------------------------------------------------
-12,941,229    644    12 -14,697,113    22,667  33,296 -14,697,093 196,198 103,505  /usr/src/debug/glibc-2.30-34-g994e529a37/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
  5,834,640    309   116   1,361,416    22,558   8,013   1,944,880       0       0  ???:_ZN4core3ptr13drop_in_place17h2d334451ff604121E.llvm.3047232575757559832
 -5,834,640   -309   -93  -1,361,416   -22,305  -8,147  -1,944,880       0       0  ???:_ZN4core3ptr13drop_in_place17h2d334451ff604121E.llvm.14951046493754649665
 -4,704,001 -6,786  -275  -1,031,214   -29,888   6,301    -802,849  13,182   4,175  /usr/src/debug/glibc-2.30-34-g994e529a37/malloc/malloc.c:_int_malloc
  2,660,698  2,227     0     958,237        -6       0     481,776      20       0  ???:_ZN5cargo4util8progress5State5print17h25c4eb5dec857e24E.llvm.17714790172995471728
   -826,762  1,803   -15    -153,015   -82,049 -46,742    -153,358   1,001     647  /usr/src/debug/glibc-2.30-34-g994e529a37/malloc/malloc.c:malloc_consolidate
   -806,184   -651   -24    -201,566   -64,090       9      52,412  -6,989     -10  /usr/src/debug/glibc-2.30-34-g994e529a37/malloc/malloc.c:malloc
   -591,222  5,241   -46    -148,617    -4,060     753      84,645  10,875    -157  /usr/src/debug/glibc-2.30-34-g994e529a37/malloc/malloc.c:_int_free
    403,708     40    -2         510         0       0         255       0       0  ???:core::intrinsics::is_nonoverlapping
   -301,845   -275   -26    -115,970   -43,771 -44,619     -33,584       0       0  /usr/src/debug/glibc-2.30-34-g994e529a37/malloc/malloc.c:unlink_chunk.isra.0
    196,931    677   108      54,384    -2,842      -2      43,974   2,948     -19  ???:alloc::raw_vec::RawVec<T,A>::reserve
    186,666    802   -34      40,289    -4,337    -187      14,859     -53      -6  /usr/src/debug/glibc-2.30-34-g994e529a37/malloc/malloc.c:realloc
    146,690 -1,963   -44      31,542     7,476    -668      22,101     -61      -4  /usr/src/debug/glibc-2.30-34-g994e529a37/malloc/malloc.c:_int_realloc
    131,682    342     0      33,170        13       0      32,318      -1       0  ???:_ZN5cargo4util8progress5State4tick17h24f78f2baccd5adbE.llvm.17714790172995471728
    -58,890   -820   203     -14,193    58,285  -2,498      -3,741  -3,269      -2  ???:hashbrown::map::RawEntryBuilderMut<K,V,S>::from_hash
    -31,850    777   -10      -7,350      -721     -38           0       0       0  ???:hashbrown::raw::RawTable<T>::bucket
     25,476      0     0      10,615         0       0           0       0       0  /rustc/7760cd0fbbbf2c59a625e075a5bdfa88b8e30f8a//src/libstd/sys/unix/alloc.rs:__rdl_realloc
     16,984      1     0           0         0       0      10,615       0       0  /rustc/7760cd0fbbbf2c59a625e075a5bdfa88b8e30f8a//src/libstd/alloc.rs:__rdl_realloc

No idea how to attribute this. Maybe it's worth noting that alloc::RawVec::reserve calls Layout::repeat via the following chain:

alloc::RawVec::reserve
alloc::RawVec::reserve_internal
Layout::array
Layout::repeat

nnethercote · 2020-03-07T01:09:47Z

The Cachegrind diff basically says that the checked_add removal caused additional memcpy and malloc calls. I'm not sure what to make of that, I can't think of an explanation.

(BTW, I usually run Cachegrind with --cache-sim=no because 99% of the time I find instruction counts (Ir) to be the only useful metric.)

rust-timer · 2020-03-07T02:54:24Z

Finished benchmarking try commit 4bfc61c90b9066ee396739e074d6d05abcfda04e, comparison URL.

tmiasko · 2020-03-07T13:40:13Z

The different performance might be strictly about the fact that a different
code is actually being compiled and optimized, since the Layout is part of the
standard library.

It would be interesting to test compile time with different core but exactly
the same rustc.

@nnethercote

Rerevert "Remove checked_add in Layout::repeat" This change, which originated in #67174 and was reapplied in #69544, seems to have caused a noticeable slowdown in patched/clean incremental builds (see #69710). Revert it for now while we investigate the underlying issue. r? @nnethercote

ecstatic-morse · 2020-03-11T17:29:42Z

The regression ultimately disappeared after #69879 (likely #69799).

@TimDiekmann It appears that minor changes to innocent-looking code in alloc.rs can result in relatively large perf changes. I suspect this will continue as alloc.rs is refactored. I think it would be a good idea to rollup=never or at least do a perf run on PRs that touch the allocator traits so we can more precisely attribute perf regressions. To clarify, this does not mean that changes should be blocked if they do indeed cause a regression.

TimDiekmann · 2020-03-11T18:55:37Z

The next upcoming refactoring of AllocRef also regressed the performance slightly: #69889

nikic · 2021-03-12T20:14:51Z

As far as I can tell, there is nothing really actionable left in this issue, so I'm going to close it.

ecstatic-morse added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Mar 4, 2020

Mark-Simulacrum added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-compiletime Issue: Problems and improvements with respect to compile times. labels Mar 4, 2020

ecstatic-morse removed the I-slow Issue: Problems and improvements with respect to performance of generated code. label Mar 4, 2020

ecstatic-morse mentioned this issue Mar 5, 2020

Rerevert "Remove checked_add in Layout::repeat" #69743

Closed

This comment has been minimized.

Sign in to view

nikic closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small perf regression in #69635 #69710

Small perf regression in #69635 #69710

ecstatic-morse commented Mar 4, 2020 •

edited

ecstatic-morse commented Mar 4, 2020 •

edited

nnethercote commented Mar 5, 2020

ecstatic-morse commented Mar 5, 2020 •

edited

This comment has been minimized.

This comment has been minimized.

ecstatic-morse commented Mar 6, 2020

rust-timer commented Mar 6, 2020

ecstatic-morse commented Mar 7, 2020 •

edited

nnethercote commented Mar 7, 2020

rust-timer commented Mar 7, 2020

tmiasko commented Mar 7, 2020

ecstatic-morse commented Mar 11, 2020 •

edited

TimDiekmann commented Mar 11, 2020

nikic commented Mar 12, 2021

Small perf regression in #69635 #69710

Small perf regression in #69635 #69710

Comments

ecstatic-morse commented Mar 4, 2020 • edited

ecstatic-morse commented Mar 4, 2020 • edited

nnethercote commented Mar 5, 2020

ecstatic-morse commented Mar 5, 2020 • edited

This comment has been minimized.

This comment has been minimized.

ecstatic-morse commented Mar 6, 2020

rust-timer commented Mar 6, 2020

ecstatic-morse commented Mar 7, 2020 • edited

nnethercote commented Mar 7, 2020

rust-timer commented Mar 7, 2020

tmiasko commented Mar 7, 2020

ecstatic-morse commented Mar 11, 2020 • edited

TimDiekmann commented Mar 11, 2020

nikic commented Mar 12, 2021

ecstatic-morse commented Mar 4, 2020 •

edited

ecstatic-morse commented Mar 4, 2020 •

edited

ecstatic-morse commented Mar 5, 2020 •

edited

ecstatic-morse commented Mar 7, 2020 •

edited

ecstatic-morse commented Mar 11, 2020 •

edited