Mas d31 nhskv16 #423

martinsumner · 2023-11-23T10:15:05Z

Add perf_SUITE test to measure guess/estimate accuracy and speed for counts.

Add improvement to recovery_SUITE/replace_everything/1 test to handle intermittent failures.

Reliability fix to recovery_SUITE Also riak_SUITE testing added for estimating data size (object count). This is useful to provide updates during handoffs (with a data size estimate, it is possible to see handoff progress as a percentage). Estimates, and guesses are measured - with the related time savings. Estimates are from counting 1 in 256 keys, Guesses from counting 1 in 1024. Results from some runs comparing estimates with guesses: Estimate 1051904 of size 1024000 with estimate taking 416 ms vs 20816 ms Guess 1077248 of size 1024000 with guess taking 198 ms vs 20816 ms Estimate 2584576 of size 2560000 with estimate taking 1190 ms vs 191221 ms Guess 2703360 of size 2560000 with guess taking 638 ms vs 191221 m Estimate 38400 of size 32000 with estimate taking 11 ms vs 117 ms Guess 45056 of size 32000 with guess taking 5 ms vs 117 ms Estimate 81408 of size 80000 with estimate taking 30 ms vs 448 ms Guess 87040 of size 80000 with guess taking 13 ms vs 448 ms Estimate 1563648 of size 1536000 with estimate taking 677 ms vs 81591 ms Guess 1606656 of size 1536000 with guess taking 350 ms vs 81591 ms Estimate 3887616 of size 3840000 with estimate taking 1756 ms vs 672349 ms Guess 3897344 of size 3840000 with guess taking 1073 ms vs 672349 ms Estimate 394496 of size 384000 with estimate taking 141 ms vs 4501 ms Guess 399360 of size 384000 with guess taking 59 ms vs 4501 ms Estimate 979200 of size 960000 with estimate taking 365 ms vs 26362 ms Guess 990208 of size 960000 with guess taking 168 ms vs 26362 ms

Some stats: - At a 2M object store - accuracy of guess is on average to within < 2% and accuracy of estimate is < 1 %. - Load times are roughly > 10% faster with OTP 26 when compared with OTP 22/24. - Counting times are roughly 15% faster with OTP 25/26 when compared with OTP 22/24. - An estimate (counting every 256 keys) takes roughly 14% the time of a full count. - A guess (counting every 1024 keys) takes roughly 8% the time of a full count.

martinsumner · 2023-11-23T10:15:43Z

nhs-riak/riak_kv#16

martinsumner · 2023-11-23T11:12:55Z

Results from running perf_SUITE test with different sizes and OTP versions:

The initial testing of the `guess` method for object count estimation indicated it was surprisingly slow. Counting 1:1000 keys took 10% of the time - although there is additional work necessary to skip using segment lists, this intuitively appears to be out by an order of magnitude. Profiling indicated that the majority of time was being spent in two places: - within `expand_list_by_pointer/5`, in particular though the use of `++`; - in the `find_pos/4` function, in particular repeated calls to `lists:member/2`. the expand_list_by_pointers has been refactored and simplified. The member check under find_pos can now be avoided where SegmentIds are not randomly distributed. This has reduced the time to `guess` object counts by 65% in OTP 22 (and by 75% by also changing to OTP 26). The outstanding problem, is that `find_pos` is still perhaps unexpectedly dominant in CPU time: leveled_sst:'-segment_checker/1-fun-2-'/5 400000000 19.31 15560374 [ 0.04] leveled_sst:find_pos/4 403085303 35.05 28242226 [ 0.07] Further, there appears to be a discrepancy between performance in eunit and ct conditions of this function.

martinsumner · 2023-11-24T16:22:08Z

The initial testing of the guess method for object count estimation indicated it was surprisingly slow. Counting 1:1000 keys took 10% of the time - although there is additional work necessary to skip using segment lists, this intuitively appears to be out by an order of magnitude.

Profiling indicated that the majority of time was being spent in two places:

within expand_list_by_pointer/5, in particular though the use of ++;
in the find_pos/4 function, in particular repeated calls to lists:member/2.

the expand_list_by_pointers has been refactored and simplified. The member check under find_pos can now be avoided where SegmentIds are not randomly distributed.

This has reduced the time to guess object counts by 65% in OTP 22 (and by 75% by also changing to OTP 26).

The outstanding problem, is that find_pos is still perhaps unexpectedly dominant in CPU time:

leveled_sst:'-segment_checker/1-fun-2-'/5                   400000000    19.31  15560374  [      0.04]

leveled_sst:find_pos/4                                      403085303    35.05  28242226  [      0.07]

Further, there appears to be a discrepancy between performance in eunit and ct conditions of this function.

martinsumner · 2023-11-24T22:11:51Z

Relative load performance with this PR:

martinsumner · 2023-11-24T22:12:16Z

Relative fold_heads performance with this PR:

Make perf_SUITE only for performance test. Two modes riak_ctperf - which will be enabled by default, and will run a short terst as part of the standard ct test run. However, this can be swapped for riak_fullperc which will run a series of tests at different object counts (which may take > 4 hours in total).

Expand perf_SUITE to support profiling (which revealed issues with find_pos). Penciller and Inker can now also reveal internal PIDs to help with profiling.

martinsumner added 3 commits November 21, 2023 11:33

Temp test state

6cd8528

martinsumner marked this pull request as ready for review November 23, 2023 10:15

Resolve spec for slot_pointer

1491519

martinsumner added 2 commits November 26, 2023 23:50

Refactor find_pos

8cca007

Expand perf_SUITE to support profiling (which revealed issues with find_pos). Penciller and Inker can now also reveal internal PIDs to help with profiling.

martinsumner closed this Dec 19, 2023

martinsumner deleted the mas-d31-nhskv16 branch April 12, 2024 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mas d31 nhskv16 #423

Mas d31 nhskv16 #423

martinsumner commented Nov 23, 2023 •

edited

martinsumner commented Nov 23, 2023

martinsumner commented Nov 23, 2023

martinsumner commented Nov 24, 2023

martinsumner commented Nov 24, 2023

martinsumner commented Nov 24, 2023

Mas d31 nhskv16 #423

Mas d31 nhskv16 #423

Conversation

martinsumner commented Nov 23, 2023 • edited

martinsumner commented Nov 23, 2023

martinsumner commented Nov 23, 2023

martinsumner commented Nov 24, 2023

martinsumner commented Nov 24, 2023

martinsumner commented Nov 24, 2023

martinsumner commented Nov 23, 2023 •

edited