Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify cache on check #3747

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MichaelEischer
Copy link
Member

@MichaelEischer MichaelEischer commented May 10, 2022

What does this PR change? What problem does it solve?

The PR is currently in a request for comment state.

Up to now the check command has to create a temporary cache to ensure that it does not miss bitrot at the repository storage. However, this has the downside that a host may have to store two copies of the repository cache: one for regular usage and one for the check command.

This PR adds support for the cache to verify its content and thereby allows the check command to use an already existing cache, which as a bonus will be repaired by redownloading locally damaged files if necessary.

The CLI now has an --temporary-cache option in addition to --no-cache-verify (which replaces --with-cache)

temporary-cache no-cache-verify behavior
false false default: use normal cache and verify its content
false true Identical to using --with-cache at the moment
true false current default behavior
true true invalid

It would probably be possible (although hacky) to let check detect that a cache already exists and then automatically use a cache if it exists and create a temporary cache otherwise. The current solution in this PR to use a normal cache be default can break some use cases where a separate host is used to run check for all repositories and there is not enough space to keep all caches.

Using the normal cache also has the potential downside that check will download all metadata of a repository and thus could increase the size of the local cache. However, if the host runs prune from time to time, then this won't be a problem as prune behaves in the same way.

When verifying an already existing cache blob there are a number of corner cases that can appear:

local hash remote hash behavior
correct correct everything is fine
wrong correct delete cache blob and retry
correct wrong fail with user visible error
wrong wrong fail with user visible error
wrong == local hash proceed normally; check will complain about that file later on.

The last case is necessary to gracefully handle incompletely uploaded files which should allow the check command to still access and verify the file if possible.

Was the change previously discussed in an issue or on the forum?

No.

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

cacheFile always returns nil (which will change soon). Anyways calling
Load manually or using loadFromCacheOrDelegate should have the same
effect. Thus always use loadFromCacheOrDelegate to simplify the code.
That way bitrot at the repository can also be detected when using an
already existing cache.
With the cache verification it is no longer necessary to create a
temporary cache. The new option `--temporary-cache` still allows
creating a temporary cache.
@MichaelEischer MichaelEischer marked this pull request as draft October 21, 2022 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant