Skip to content

disk I O settings for checking files

Arvid Norberg edited this page Jun 7, 2020 · 1 revision

Results from running experiments with tools/benchmark_checking.py. This is the initial state of RC_2_0 as the release candidate branch was cut. The benchmark python script was modified to drop disk caches (echo 3 > /proc/sys/vm/drop_caches) in between each run. These tests were conducted on Ubuntu 20.04.

baseline

Time to check files, in milliseconds, on NVMe SSD.

+-----------+-----------------------------------+ checking_mem_usage | +===========+========+========+========+========+ | 512 2048| +-----------+--------+--------+--------+--------+ 54682 55522 +-----------+--------+--------+--------+--------+ 29820 30068 +-----------+--------+--------+--------+--------+ 15619 15542 +-----------+--------+--------+--------+--------+ 8569 8513 +-----------+--------+--------+--------+--------+ 6244 6323 +-----------+--------+--------+--------+--------+

Time to check files, in milliseconds, on spinning harddrive (HDD).

+-----------+-----------------------------------+ checking_mem_usage | +===========+========+========+========+========+ | 512 2048| +-----------+--------+--------+--------+--------+ 60493 61384 +-----------+--------+--------+--------+--------+ 60546 60757 +-----------+--------+--------+--------+--------+ 144554 145621 +-----------+--------+--------+--------+--------+ 152985 153728 +-----------+--------+--------+--------+--------+ 163493 165105 +-----------+--------+--------+--------+--------+

From these results it seems checking_mem_usage does not make any significant difference.

The default is changed from 1024 to 256, which represents 4 MiB of outstanding hash jobs.

Experiment 1: set MADV_SEQUENTIAL for file maps when checking

aio_threads time (ms) SSD time(ms) HDD
4 53686 59612
8 29597 60361
16 15437 145670
32 8566 152811
64 6324 165507

madvise(MADV_SEQUENTIAL) made no difference in these tests.

Experiment 2: one in three is a hashing thread

The current default is 1 in 4 disk threads are dedicated for hashing. Change this to 1 in 3 disk threads.

aio_threads time (ms) SSD time (ms) HDD
3 54082 59313
4 53889 60679
6 28985 60173
8 29283 60237
12 15264 146490
16 12523 142329
18 10760 147569
24 8335 152885
32 7172 155793
64 5864 174260

Experiment 3: one in two hashing threads

Make every other disk thread dedicated for computing hashes.

aio_threads time (ms) SSD time (ms) HDD
3 53843 60315
4 30139 59588
6 20304 141779
8 15566 144288
12 10824 146721
16 8473 152995
18 7822 153336
24 6682 158921
32 6239 162185
64 5917 251542

It seems HDD performance drops significantly at > 2 hasher threads, whereas SSD performance (which is most likely CPU bound for the most part) just improves the more threads thrown at it.

Some more investigation need to go into understanding what happens at 3 hasher threads on a hard drive.