disk I O settings for checking files
Results from running experiments with tools/benchmark_checking.py
. This is the initial state of RC_2_0
as the release candidate branch was cut. The benchmark python script was modified to drop disk caches (echo 3 > /proc/sys/vm/drop_caches
) in between each run. These tests were conducted on Ubuntu 20.04.
Time to check files, in milliseconds, on NVMe SSD.
+-----------+-----------------------------------+ checking_mem_usage | +===========+========+========+========+========+ | 512 2048| +-----------+--------+--------+--------+--------+ 54682 55522 +-----------+--------+--------+--------+--------+ 29820 30068 +-----------+--------+--------+--------+--------+ 15619 15542 +-----------+--------+--------+--------+--------+ 8569 8513 +-----------+--------+--------+--------+--------+ 6244 6323 +-----------+--------+--------+--------+--------+
Time to check files, in milliseconds, on spinning harddrive (HDD).
+-----------+-----------------------------------+ checking_mem_usage | +===========+========+========+========+========+ | 512 2048| +-----------+--------+--------+--------+--------+ 60493 61384 +-----------+--------+--------+--------+--------+ 60546 60757 +-----------+--------+--------+--------+--------+ 144554 145621 +-----------+--------+--------+--------+--------+ 152985 153728 +-----------+--------+--------+--------+--------+ 163493 165105 +-----------+--------+--------+--------+--------+
From these results it seems checking_mem_usage
does not make any significant difference.
The default is changed from 1024 to 256, which represents 4 MiB of outstanding hash jobs.
aio_threads | time (ms) SSD | time(ms) HDD |
---|---|---|
4 | 53686 | 59612 |
8 | 29597 | 60361 |
16 | 15437 | 145670 |
32 | 8566 | 152811 |
64 | 6324 | 165507 |
madvise(MADV_SEQUENTIAL)
made no difference in these tests.
The current default is 1 in 4 disk threads are dedicated for hashing. Change this to 1 in 3 disk threads.
aio_threads | time (ms) SSD | time (ms) HDD |
---|---|---|
3 | 54082 | 59313 |
4 | 53889 | 60679 |
6 | 28985 | 60173 |
8 | 29283 | 60237 |
12 | 15264 | 146490 |
16 | 12523 | 142329 |
18 | 10760 | 147569 |
24 | 8335 | 152885 |
32 | 7172 | 155793 |
64 | 5864 | 174260 |
Make every other disk thread dedicated for computing hashes.
aio_threads | time (ms) SSD | time (ms) HDD |
---|---|---|
3 | 53843 | 60315 |
4 | 30139 | 59588 |
6 | 20304 | 141779 |
8 | 15566 | 144288 |
12 | 10824 | 146721 |
16 | 8473 | 152995 |
18 | 7822 | 153336 |
24 | 6682 | 158921 |
32 | 6239 | 162185 |
64 | 5917 | 251542 |
It seems HDD performance drops significantly at > 2 hasher threads, whereas SSD performance (which is most likely CPU bound for the most part) just improves the more threads thrown at it.
Some more investigation need to go into understanding what happens at 3 hasher threads on a hard drive.