New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Re-Read on checksum-fail (check --data-read) #4774
Comments
Since I upgraded from restic 0.14.0 to 0.16.4, I get many
Some way to make |
I've rebuilt the handling of transient errors in #4800. For now that PR lets
At least on my non-ECC hardware, I've never seen restic report a bitflip. So I'm relatively sure that your error rate is on the upper end of that of typical system (excluding those with a such high error rate that the system becomes unusable).
I doubt that most of these options would be useful, except maybe a "ignore-retried-errors" option that does not let check fail if it had to re-read a few pack files. The general assumption is that bitflips are infrequent enough that they don't totally corrupt the memory of restic, thus a single retry should hopefully always be enough. If not, then the system is probably too unstable to be of much use.
There's only been little change in the code of the |
Output of
restic version
restic 0.16.4 compiled with go1.21.6 on windows/amd64
What should restic do differently? Which functionality do you think we should add?
Re-Read a single file on
check --data-read
, when an error is detected.What are you trying to do? What problem would this solve?
There are several possible Issues which can lead to a faulty result on hashing files.
For example: the chance of having bit-flip on non-ECC-RAM grows with the number of pack files.
My Case:
My 900gb Repo with nearly 200k packs is stored on a storage share at my hoster.. for redundancy i rsync from time to time localy to an old PC with non-ECC-RAM. After that, i start
check --read-data
localy for obvious reasons =)Sometimes i get an error about wrong checksum of a blob.
But:
sha256sum
shows no problem.restic check --read-data
and the check exits with an error on another file..This shows me, there is no problem on the disk but somewhere else.
I think, it would increase the comfort, when restic re-reads files, where it got an error.. maybe 1 or 2 times.. just for getting deeper into the problem.
If the error still exists..
--read-data
can continueon small repos, this will not increase comfort.. but on large-ones like mine, which needs about 4 hours to be scanned, it would increase comfort much.
One could now say: change your faulty hardware! yes.. but.. if this is a non-ECC-RAM problem, it probably affects many others, too. non-ECC-RAM is not that rare on the planet =)
And also it would fasten the error-analysis.
On-top, maybe it is a nice idea to have controll over this feature via commandline.. for example
disable
,counts of re-reads
,exit/continue on error
, etc..Did restic help you today? Did it make you happy in any way?
yeah.. restics helps me several times a week! beside this, it helped me today to get in touch with you, cheers! <3
The text was updated successfully, but these errors were encountered: