Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command output is not valid JSON: invalid character 'R' after top-level value #277

Closed
lakemike opened this issue May 13, 2024 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@lakemike
Copy link

I am running restic based on the latest docker image. Huge thanks, George, for launching a great project. It's very nice to be able to turn all these backup scripts into a more structured approach and use it via the GUI!

Situation: I moved all my backup plans to backrest. One computer running backrest on docker is executing ~15 plans and is linked on one backup repository. The latter is accessed via rclone on the pcloud storage system.

Bug: More of less each night one of the backup plans fails and prevents all others from executing. I can fix the situation with check / repair index / repair snapshots etc. But it comes back.

Based on the output below, it seems to be the case that the JSON output is not correct. Apparently, the unlock command throws an error and produces further text output and this may impact the JSON output too.

command: /bin/restic-0.16.4 forget --json --keep-hourly 1 --keep-daily 1 --keep-weekly 1 --keep-monthly 1 --keep-yearly 2 -o sftp.args=-oBatchMode=yes --tag plan:mydata-pcloud-restic-gvnx --group-by tag

[{"tags":["plan:mydata-pcloud-restic-gvnx"],"host":"","paths":null,"keep":[{"time":"2024-05-13T04:00:05.08435425+02:00","parent":"841447f22d75e0d21dd4c2cfab9ba72839e4542b27a8e088f3158be36aec63c9","tree":"bedc5f8bf886106df4eb69cd7249f199309edc14196aa2e93096f312b60febb8","paths":["/share/MyDir1/MyDir2/MyDir3"],"hostname":"myhost","username":"root","excludes":[".@",".stfolder",".stignore","@"],"tags":["plan:mydata-pcloud-restic-gvnx"],"program_version":"restic 0.16.4","id":"f01eeb67238cb148afd1a2f18e5a3446b7165ecdd1e92c31c98725a64a655d98","short_id":"f01eeb67"},{"time":"2024-05-06T10:36:32.938595366+02:00","tree":"3ce8633263ad379b3457b6cbe0165ba46f4bc1d6358be1d4984f772d0f0d989d","paths":["/share/MyDir1/MyDir2/MyDir3"],"hostname":"myhost","username":"root","excludes":[".@",".stfolder",".stignore","@"],"tags":["plan:mydata-pcloud-restic-gvnx"],"program_version":"restic 0.16.4","id":"7b0dfcd10d5222caa5fbfec930089a6e0991700a52a734ed4c6a5310e96980ca","short_id":"7b0dfcd1"}],"remove":[{"time":"2024-05-12T08:38:07.231569442+02:00","parent":"9f7280f70a946abe63e3b15eb021c467b62d2782cd52f51f905b26aa1ba54e73","tree":"bedc5f8bf886106df4eb69cd7249f199309edc14196aa2e93096f312b60febb8","paths":["/share/MyDir1/MyDir2/MyDir3"],"hostname":"myhost","username":"root","excludes":[".@",".stfolder",".stignore","@"],"tags":["plan:mydata-pcloud-restic-gvnx"],"program_version":"restic 0.16.4","id":"841447f22d75e0d21dd4c2cfab9ba72839e4542b27a8e088f3158be36aec63c9","short_id":"841447f2"}],"reasons":[{"snapshot":{"time":"2024-05-13T04:00:05.08435425+02:00","parent":"841447f22d75e0d21dd4c2cfab9ba72839e4542b27a8e088f3158be36aec63c9","tree":"bedc5f8bf886106df4eb69cd7249f199309edc14196aa2e93096f312b60febb8","paths":["/share/MyDir1/MyDir2/MyDir3"],"hostname":"myhost","username":"root","excludes":[".@",".stfolder",".stignore","@"],"tags":["plan:mydata-pcloud-restic-gvnx"],"program_version":"restic 0.16.4"},"matches":["hourly snapshot","daily snapshot","weekly snapshot","monthly snapshot","yearly snapshot"],"counters":{"yearly":1}},{"snapshot":{"time":"2024-05-06T10:36:32.938595366+02:00","tree":"3ce8633263ad379b3457b6cbe0165ba46f4bc1d6358be1d4984f772d0f0d989d","paths":["/share/MyDir1/MyDir2/MyDir3"],"hostname":"myhost","username":"root","excludes":[".@",".stfolder",".stignore","@"],"tags":["plan:mydata-pcloud-restic-gvnx"],"program_version":"restic 0.16.4"},"matches":["yearly snapshot"],"counters":{}}]}]

Remove(<lock/a807d9706d>) returned error, retrying after 741.164112ms: <lock/a807d9706d> does not exist

Remove(<lock/a807d9706d>) operation successful after 1 retries

command "/bin/restic-0.16.4 forget --json --keep-hourly 1 --keep-daily 1 --keep-weekly 1 --keep-monthly 1 --keep-yearly 2 -o sftp.args=-oBatchMode=yes --tag plan:mydata-pcloud-restic-gvnx --group-by tag" failed: command output is not valid JSON: invalid character 'R' after top-level value

Process STDOUT:

[{"tags":["plan:mydata-pcloud-restic-gvnx"],"host":"","paths":null,"keep":[{"time":"2024-05-13T04:00:05.08435425+02:00","parent":"841447f22d75e0d21dd4c2cfab9ba72839e4542b27a8e088f3158be36aec63c9","tree":"bedc5f8bf886106df4eb69cd7249f199309edc14196aa2e93096f312b60febb8","paths":["/share/MyDir1/MyDir2/MyDir3"],"hostname":"myhost","username":"root","excludes":[".@",".stfolder",".stignore","@"],"tags":["plan:mydata-pcloud-restic-gvnx"],"program_version":"restic 0.16.4","id":"f01eeb67238cb148afd1a2f18e5a3446b7165ecdd1e92c31c98725a64a655d98","short_id":"f01eeb67"},{"time":"2024-05-06T10:36:32.938595366+02:00","tree":"3ce8633263ad379b3457b6cbe0165ba46f4bc1d6358be1d4984f772d0f0d989d","paths":["/share/MyDir1/MyDir2/MyDir3"],"hostname":"myhost","username":"root","excludes":[".@",".stfolder",".stignore","@"],"tags":["plan:mydata-pcloud-restic-gvnx"],"program_version":"restic 0.16.4","id":"7b0dfcd10d5222caa5fbfec930089a6e0991700a52a734ed4c6a5310e96980ca","short_id":

...[truncated]

@lakemike lakemike added the bug Something isn't working label May 13, 2024
@garethgeorge
Copy link
Owner

garethgeorge commented May 13, 2024

Hi, noticing you’re using a rclone backend — this looks like a real problem with the storage (e.g. probably not something Backrest can address). Looks like restic logged some warnings about locks (that should exist) not being found.

Backrest is expected to fail the operation in this case.

That you’re needing to run a check and repair frequently is pretty worrying, you may need to look at other storage options.

@lakemike
Copy link
Author

Hi,
it's on pcloud.com, which I have accessed for 5+ years via rclone, as part of the previous setup with duplicity. There must have been literally millions of operations on that cloud without problem... I struggle to believe why this would now be an issue.
Does the error message make sense? First it fails (lock/a807d9706d does not exist), then it succeeds?
Could there be an issue with rclone caching or concurrency of multiple rclone calls on the same repository? Essentially a race condition? Just wondering without having looked at that part of the code.
Pls let me know if I can do more exhaustive testing.

@garethgeorge
Copy link
Owner

garethgeorge commented May 13, 2024

Hey, agreed that this seems frustrating that it pops up now.

Found a thread on the restic forum about issues with pcloud https://forum.restic.net/t/atomic-uploads-with-pcloud-rclone-webdav/4888 . It sounds like in the past it's been unstable -- not sure about the current state. My guess is that pcloud is eventually consistent storage (so as you say there are probably race conditions w.r.t. when a lock file is actually reported to exist by the read API).

The eventual consistency combined with the fact that you have many plans configured means that Backrest is running a number of forget operations one after another -- this should be safe on consistent storage but may be a problem for pcloud through rclone (or perhaps it's just running frequently enough that rare race conditions with the locks are surfacing). Either way this sounds like the storage provider is violating some expectation of restic.

From looking at the logs it does sound like the operations eventually succeed but Backrest is conservative; it expects the output of a command to purely be valid JSON. Any warning output is considered a failure for the operation and generates a notification which I think is the correct stance to take.

What makes this alarming overall report alarming is that you report needing a check / repair to recover -- I'd be interested to hear more details there. If not for that (implying data might be getting lost or corrupted), I'd suggest that you could try --no-lock to disable locking if only one computer ever accesses this repository. This is risky though.

@lakemike
Copy link
Author

Hello, thank you so much for digging into it. I’ll do more testing on my end for sure. If I don’t fix this, I have to go back to my duplicity scripts.. Ooof.

In my setup of the 15 plans I was a bit lazy and entered „4am daily“ for most if not all of them. Potentially this may have aggravated issues with concurrency? I will put 5min between them and see what happens.

@lakemike
Copy link
Author

I have now put a "sleep 60" into the HOOK triggered at "CONDITION_SNAPSHOT_START" for all 15+ plans.
=> This will delay the startup of a new plan by 60 seconds and give any potential previous plan 60 seconds time to finish (e.g. emptying the rclone cache)

I need to observe it for a few more days. But looks good so far..

@garethgeorge
Copy link
Owner

Great! Glad that's working for you -- I think the delay is probably a "safer" option than no-lock!

I'm going to go ahead and mark this as resolved as it's specific to a storage provider rather than a Backrest issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants