dump: Parallelize loading large files #4796

MichaelEischer · 2024-05-05T10:08:46Z

What does this PR change? What problem does it solve?

Improve dump performance for large files by loading their blobs concurrently using repo.Connections() goroutines.

The largest part of the code changes in here is actually a modification of the bloblru cache. There's now a new GetOrCompute method that ensures that if multiple goroutines try to get the same blob that is not yet in the cache, that the blob will only be downloaded once.

I've initially tried to use a sync.Pool to reuse byte slices, however, that turned out to be rather complicated to get right when using the bloblru cache. A byte slice may only be reused if no other goroutine can have a reference to it. With the current interface this is next to impossible to guarantee.

Was the change previously discussed in an issue or on the forum?

Basic variant as discussed in #3406 .

Checklist

I have read the contribution guidelines.
I have enabled maintainer edits.
I have added tests for all code changes.
~~[ ] I have added documentation for relevant changes (in the manual).~~
There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
I have run gofmt on the code in all commits.
All commit messages are formatted in the same style as the other commits in the repo.
I'm done! This pull request is ready for review.

MichaelEischer

LGTM

MichaelEischer added 5 commits May 5, 2024 11:38

dump: load blobs of a file from repository in parallel

45509ea

dump: add GetOrCompute to bloblru cache

bd03af2

fuse: switch to use bloblru.GetOrCompute

7cce667

bloblru: add test for GetOrCompute

4d55a62

dump: add changelog

e184538

MichaelEischer mentioned this pull request May 5, 2024

Restic dump perf changes #4793

Closed

8 tasks

MichaelEischer commented May 14, 2024

View reviewed changes

MichaelEischer merged commit 7ed560a into restic:master May 14, 2024
13 checks passed

MichaelEischer deleted the parallel-dump-load branch May 14, 2024 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dump: Parallelize loading large files #4796

dump: Parallelize loading large files #4796

MichaelEischer commented May 5, 2024 •

edited

MichaelEischer left a comment

dump: Parallelize loading large files #4796

dump: Parallelize loading large files #4796

Conversation

MichaelEischer commented May 5, 2024 • edited

What does this PR change? What problem does it solve?

Was the change previously discussed in an issue or on the forum?

Checklist

MichaelEischer left a comment

Choose a reason for hiding this comment

MichaelEischer commented May 5, 2024 •

edited