Backup stops working without showing any errors #243

mattiarainieri · 2024-04-23T10:00:42Z

Good Morning

Describe the bug
After a while the backup is running, it stops working without showing any errors. The network usage becomes 0% and to resume I need to delete the backup and force a new one. I'm using rclone to sync using WebDav as protocol; i was not able to see/enable any logs 🥲
Thanks everyone for your help

To Reproduce
Steps to reproduce the behavior:

Create an Hetzner Storage Box
Init it using rclone config using WebDav
Start a new Backup

Screenshots

Platform Info

OS and Architecture [Docker on Ubuntu 22.04 x86]
Backrest Version [e.g. 0.17.2]

The text was updated successfully, but these errors were encountered:

garethgeorge · 2024-04-23T22:23:42Z

Hey, you should have logs available in systemd or you can try running backrest manually (e.g. outside systemd) and capturing its output to a file.

mattiarainieri · 2024-04-24T09:43:21Z

I'm running it using the official docker container, but inside it, both systemd and journalctl are missing. Should you suggest running on the host operating system?

garethgeorge · 2024-04-25T04:17:52Z

Ah, sorry missed that you're using docker. docker logs <container name> should return the logs for the container.

wirehiker · 2024-04-28T21:12:13Z

Platform:

docker-ce 5:26.1.0-1~ debian.12~bookworm on OMV 7.0.5-1 (Sandworm)
Backrest 0.17.2

I experienced the same problem with slight variations: the backup repository was on an internal HDD, hence I didn't use the rclone config to set up the repository.
The backup task stalled at ~8% after a few minutes had passed, and it didn't progress in the next 10 hours. Therefore, I considered it stalled and cancelled the process.

To Reproduce
Steps to reproduce the behavior:

Create a repository on an internal folder
Create a backup plan
Start a new Backup

The log file doesn't show any error or warning :
�[2Kbackrest | 2024-04-27T19:34:08.521+0200 �[35mDEBUG�[0m loading auth secret from file �[2Kbackrest | 2024-04-27T19:34:08.530+0200 �[34mINFO�[0m scheduling default tasks, waiting for task queue reset. �[2Kbackrest | 2024-04-27T19:34:08.530+0200 �[34mINFO�[0m reset task queue, scheduling new task set. �[2Kbackrest | 2024-04-27T19:34:08.530+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-27T19:34:13+02:00"} �[2Kbackrest | 2024-04-27T19:34:08.531+0200 �[34mINFO�[0m starting orchestrator loop �[2Kbackrest | 2024-04-27T19:34:08.533+0200 �[34mINFO�[0m starting web server :9898 �[2Kbackrest | 2024-04-27T19:34:13.533+0200 �[34mINFO�[0m running task {"task": "collect garbage"} �[2Kbackrest | 2024-04-27T19:34:13.539+0200 �[34mINFO�[0m collecting garbage {"forgotten_snapshots": 0, "operations_removed": 0} �[2Kbackrest | 2024-04-27T19:34:13.539+0200 �[34mINFO�[0m task finished {"task": "collect garbage", "duration": "5.859073ms"} �[2Kbackrest | 2024-04-27T19:34:13.539+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-28T19:34:13+02:00"} �[2Kbackrest | 2024-04-27T19:35:14.649+0200 �[34mINFO�[0m scheduling default tasks, waiting for task queue reset. �[2Kbackrest | 2024-04-27T19:35:14.649+0200 �[35mDEBUG�[0m queued task cancelled due to config change {"task": "collect garbage"} �[2Kbackrest | 2024-04-27T19:35:14.649+0200 �[34mINFO�[0m reset task queue, scheduling new task set. �[2Kbackrest | 2024-04-27T19:35:14.649+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-27T19:35:19+02:00"} �[2Kbackrest | 2024-04-27T19:35:19.654+0200 �[34mINFO�[0m running task {"task": "collect garbage"} �[2Kbackrest | 2024-04-27T19:35:19.662+0200 �[34mINFO�[0m collecting garbage {"forgotten_snapshots": 0, "operations_removed": 0} �[2Kbackrest | 2024-04-27T19:35:19.662+0200 �[34mINFO�[0m task finished {"task": "collect garbage", "duration": "7.951121ms"} �[2Kbackrest | 2024-04-27T19:35:19.662+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-28T19:35:19+02:00"} �[2Kbackrest | 2024-04-27T19:36:16.960+0200 �[35mDEBUG�[0m updating config {"version": 1} �[2Kbackrest | 2024-04-27T19:36:16.968+0200 �[35mDEBUG�[0m applying config {"version": 1} �[2Kbackrest | 2024-04-27T19:36:16.968+0200 �[34mINFO�[0m scheduling default tasks, waiting for task queue reset. �[2Kbackrest | 2024-04-27T19:36:16.968+0200 �[35mDEBUG�[0m queued task cancelled due to config change {"task": "collect garbage"} �[2Kbackrest | 2024-04-27T19:36:16.968+0200 �[34mINFO�[0m reset task queue, scheduling new task set. �[2Kbackrest | 2024-04-27T19:36:16.968+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-27T19:36:21+02:00"} �[2Kbackrest | 2024-04-27T19:36:16.969+0200 �[35mDEBUG�[0m scheduling index snapshots task �[2Kbackrest | 2024-04-27T19:36:16.969+0200 �[34mINFO�[0m scheduling task {"task": "index snapshots for plan \"matteo.local.repo\"", "runAt": "2024-04-27T19:36:16+02:00"} �[2Kbackrest | 2024-04-27T19:36:16.969+0200 �[35mDEBUG�[0m done add repo �[2Kbackrest | 2024-04-27T19:36:16.969+0200 �[34mINFO�[0m running task {"task": "index snapshots for plan \"matteo.local.repo\""} �[2Kbackrest | 2024-04-27T19:36:17.743+0200 �[35mDEBUG�[0m indexed known snapshot IDs for repo matteo.local.repo in 56.001µs �[2Kbackrest | 2024-04-27T19:36:17.748+0200 �[35mDEBUG�[0m indexed snapshots {"repo": "matteo.local.repo", "duration": "4.261685ms", "alreadyIndexed": 0, "newlyAdded": 0, "markedForgotten": 0} �[2Kbackrest | 2024-04-27T19:36:17.748+0200 �[34mINFO�[0m task finished {"task": "index snapshots for plan \"matteo.local.repo\"", "duration": "778.73811ms"} �[2Kbackrest | 2024-04-27T19:36:21.972+0200 �[34mINFO�[0m running task {"task": "collect garbage"} �[2Kbackrest | 2024-04-27T19:36:21.978+0200 �[34mINFO�[0m collecting garbage {"forgotten_snapshots": 0, "operations_removed": 0} �[2Kbackrest | 2024-04-27T19:36:21.978+0200 �[34mINFO�[0m task finished {"task": "collect garbage", "duration": "5.806488ms"} �[2Kbackrest | 2024-04-27T19:36:21.978+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-28T19:36:21+02:00"} �[2Kbackrest | 2024-04-27T19:40:45.238+0200 �[34mINFO�[0m scheduling default tasks, waiting for task queue reset. �[2Kbackrest | 2024-04-27T19:40:45.238+0200 �[35mDEBUG�[0m queued task cancelled due to config change {"task": "collect garbage"} �[2Kbackrest | 2024-04-27T19:40:45.238+0200 �[34mINFO�[0m reset task queue, scheduling new task set. �[2Kbackrest | 2024-04-27T19:40:45.238+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-27T19:40:50+02:00"} �[2Kbackrest | 2024-04-27T19:40:45.244+0200 �[34mINFO�[0m scheduling task {"task": "backup for plan \"matteo.partial.plan\"", "runAt": "2024-04-27T19:42:00+02:00"} �[2Kbackrest | 2024-04-27T19:40:50.240+0200 �[34mINFO�[0m running task {"task": "collect garbage"} �[2Kbackrest | 2024-04-27T19:40:50.250+0200 �[34mINFO�[0m collecting garbage {"forgotten_snapshots": 0, "operations_removed": 0} �[2Kbackrest | 2024-04-27T19:40:50.250+0200 �[34mINFO�[0m task finished {"task": "collect garbage", "duration": "9.699785ms"} �[2Kbackrest | 2024-04-27T19:40:50.250+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-28T19:40:50+02:00"} �[2Kbackrest | 2024-04-27T19:42:00.001+0200 �[34mINFO�[0m running task {"task": "backup for plan \"matteo.partial.plan\""} �[2Kbackrest | 2024-04-27T19:42:00.006+0200 �[34mINFO�[0m Starting backup {"plan": "matteo.partial.plan", "opId": 1797510550246129665} �[2Kbackrest | 2024-04-27T19:42:00.006+0200 �[35mDEBUG�[0m repo orchestrator starting backup {"repo": "matteo.local.repo"} �[2Kbackrest | 2024-04-27T19:42:06.871+0200 �[35mDEBUG�[0m got snapshots for plan {"repo": "matteo.local.repo", "repo": "matteo.local.repo", "count": 0, "plan": "matteo.partial.plan", "tag": "plan:matteo.partial.plan"}

After I redeployed the container, the same plan on the same repository successfully completed the backup, flawlessly according to expectations:
�[2Kbackrest | 2024-04-28T15:10:02.889+0200 �[35mDEBUG�[0m loading auth secret from file �[2Kbackrest | 2024-04-28T15:10:02.904+0200 �[35mDEBUG�[0m unlocking repo {"repo": "matteo.local.repo", "repo": "matteo.local.repo"} �[2Kbackrest | 2024-04-28T15:10:03.521+0200 �[34mINFO�[0m scheduling default tasks, waiting for task queue reset. �[2Kbackrest | 2024-04-28T15:10:03.521+0200 �[34mINFO�[0m reset task queue, scheduling new task set. �[2Kbackrest | 2024-04-28T15:10:03.521+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-28T15:10:08+02:00"} �[2Kbackrest | 2024-04-28T15:10:03.525+0200 �[34mINFO�[0m scheduling task {"task": "backup for plan \"matteo.partial.plan\"", "runAt": "2024-04-28T19:42:00+02:00"} �[2Kbackrest | 2024-04-28T15:10:03.525+0200 �[34mINFO�[0m starting orchestrator loop �[2Kbackrest | 2024-04-28T15:10:03.527+0200 �[34mINFO�[0m starting web server :9898 �[2Kbackrest | 2024-04-28T15:10:08.523+0200 �[34mINFO�[0m running task {"task": "collect garbage"} �[2Kbackrest | 2024-04-28T15:10:08.532+0200 �[34mINFO�[0m collecting garbage {"forgotten_snapshots": 0, "operations_removed": 0} �[2Kbackrest | 2024-04-28T15:10:08.532+0200 �[34mINFO�[0m task finished {"task": "collect garbage", "duration": "8.528664ms"} �[2Kbackrest | 2024-04-28T15:10:08.532+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-29T15:10:08+02:00"} �[2Kbackrest | 2024-04-28T15:10:39.226+0200 �[35mDEBUG�[0m unlocking repo {"repo": "matteo.local.repo", "repo": "matteo.local.repo"} �[2Kbackrest | 2024-04-28T15:15:26.296+0200 �[34mINFO�[0m scheduling default tasks, waiting for task queue reset. �[2Kbackrest | 2024-04-28T15:15:26.300+0200 �[35mDEBUG�[0m queued task cancelled due to config change {"task": "backup for plan \"matteo.partial.plan\""} �[2Kbackrest | 2024-04-28T15:15:26.301+0200 �[35mDEBUG�[0m queued task cancelled due to config change {"task": "collect garbage"} �[2Kbackrest | 2024-04-28T15:15:26.301+0200 �[34mINFO�[0m reset task queue, scheduling new task set. �[2Kbackrest | 2024-04-28T15:15:26.301+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-28T15:15:31+02:00"} �[2Kbackrest | 2024-04-28T15:15:26.305+0200 �[34mINFO�[0m scheduling task {"task": "backup for plan \"matteo.partial.plan\"", "runAt": "2024-04-28T15:16:00+02:00"} �[2Kbackrest | 2024-04-28T15:15:31.303+0200 �[34mINFO�[0m running task {"task": "collect garbage"} �[2Kbackrest | 2024-04-28T15:15:31.311+0200 �[34mINFO�[0m collecting garbage {"forgotten_snapshots": 0, "operations_removed": 0} �[2Kbackrest | 2024-04-28T15:15:31.311+0200 �[34mINFO�[0m task finished {"task": "collect garbage", "duration": "7.697248ms"} �[2Kbackrest | 2024-04-28T15:15:31.311+0200 �[34mINFO�[0m scheduling task {"task": "collect garbage", "runAt": "2024-04-29T15:15:31+02:00"} �[2Kbackrest | 2024-04-28T15:16:00.028+0200 �[34mINFO�[0m running task {"task": "backup for plan \"matteo.partial.plan\""} �[2Kbackrest | 2024-04-28T15:16:00.035+0200 �[34mINFO�[0m Starting backup {"plan": "matteo.partial.plan", "opId": 1797584454996197379} �[2Kbackrest | 2024-04-28T15:16:00.035+0200 �[35mDEBUG�[0m repo orchestrator starting backup {"repo": "matteo.local.repo"} �[2Kbackrest | 2024-04-28T15:16:06.904+0200 �[35mDEBUG�[0m got snapshots for plan {"repo": "matteo.local.repo", "repo": "matteo.local.repo", "count": 0, "plan": "matteo.partial.plan", "tag": "plan:matteo.partial.plan"} �[2Kbackrest | 2024-04-28T18:12:29.537+0200 �[35mDEBUG�[0m Backup completed {"repo": "matteo.local.repo", "repo": "matteo.local.repo", "duration": "2h56m22.630435411s"} �[2Kbackrest | 2024-04-28T18:12:29.543+0200 �[34mINFO�[0m Backup complete {"plan": "matteo.partial.plan", "duration": "2h56m29.506771428s", "summary": "summary:{files_new:93721 dirs_new:19695 data_blobs:251450 tree_blobs:18592 data_added:253023600353 total_files_processed:93721 total_bytes_processed:334081958305 total_duration:10582.065976532 snapshot_id:\"711374edf671e83223a2c0ce8d62820429fb8358a0c56b49dc027f3701da8115\"}"} �[2Kbackrest | 2024-04-28T18:12:29.562+0200 �[34mINFO�[0m scheduling task {"task": "forget for plan \"matteo.partial.plan\"", "runAt": "2024-04-28T18:12:29+02:00"} �[2Kbackrest | 2024-04-28T18:12:29.562+0200 �[34mINFO�[0m scheduling task {"task": "index snapshots for plan \"matteo.local.repo\"", "runAt": "2024-04-28T18:12:29+02:00"} �[2Kbackrest | 2024-04-28T18:12:29.574+0200 �[35mDEBUG�[0m wrote operation log to 2024-04-28-logs.tar/0 �[2Kbackrest | 2024-04-28T18:12:29.579+0200 �[34mINFO�[0m task finished {"task": "backup for plan \"matteo.partial.plan\"", "duration": "2h56m29.550845644s"} �[2Kbackrest | 2024-04-28T18:12:29.584+0200 �[34mINFO�[0m scheduling task {"task": "backup for plan \"matteo.partial.plan\"", "runAt": "2024-04-29T15:16:00+02:00"} �[2Kbackrest | 2024-04-28T18:12:29.584+0200 �[34mINFO�[0m running task {"task": "forget for plan \"matteo.partial.plan\""} �[2Kbackrest | 2024-04-28T18:12:30.494+0200 �[35mDEBUG�[0m forget snapshots {"plan": "matteo.partial.plan", "count": 0, "policy": "policy_time_bucketed:{hourly:24 daily:7 weekly:4 monthly:3}"} �[2Kbackrest | 2024-04-28T18:12:30.497+0200 �[35mDEBUG�[0m wrote operation log to 2024-04-28-logs.tar/2560 �[2Kbackrest | 2024-04-28T18:12:30.502+0200 �[34mINFO�[0m task finished {"task": "forget for plan \"matteo.partial.plan\"", "duration": "917.45117ms"} �[2Kbackrest | 2024-04-28T18:12:30.502+0200 �[34mINFO�[0m running task {"task": "index snapshots for plan \"matteo.local.repo\""} �[2Kbackrest | 2024-04-28T18:12:31.284+0200 �[35mDEBUG�[0m indexed known snapshot IDs for repo matteo.local.repo in 776.485µs �[2Kbackrest | 2024-04-28T18:12:31.289+0200 �[35mDEBUG�[0m indexed snapshots {"repo": "matteo.local.repo", "duration": "4.757505ms", "alreadyIndexed": 0, "newlyAdded": 1, "markedForgotten": 0} �[2Kbackrest | 2024-04-28T18:12:31.289+0200 �[34mINFO�[0m task finished {"task": "index snapshots for plan \"matteo.local.repo\"", "duration": "787.750105ms"}

Hopefully this helps, and thank you for your time and the good work ;)

mattiarainieri · 2024-05-02T10:19:03Z

I am continuing to investigate... I've used this time an s3 buket, so insted using rclone i've used the built-in restic, but the operation got stuck anyway. I would like to emphasize that the blockage occurs only for long time operations, for example, a first backup to a new memory with not-too-fast internet connections, such as mine (~24h).
Nothing relevant in the logs, here the final snippet:

2024-05-01T09:12:16.024Z	INFO	task finished	{"task": "index snapshots for plan \"cubbit-test\"", "duration": "3.191874876s"}
2024-05-01T09:12:17.830Z	INFO	running task	{"task": "collect garbage"}
2024-05-01T09:12:17.832Z	INFO	collecting garbage	{"forgotten_snapshots": 0, "operations_removed": 0}
2024-05-01T09:12:17.832Z	INFO	task finished	{"task": "collect garbage", "duration": "2.047841ms"}
2024-05-01T09:12:17.832Z	INFO	scheduling task	{"task": "collect garbage", "runAt": "2024-05-02T09:12:17Z"}
2024-05-01T09:14:21.901Z	INFO	scheduling default tasks, waiting for task queue reset.
2024-05-01T09:14:21.901Z	DEBUG	queued task cancelled due to config change	{"task": "collect garbage"}
2024-05-01T09:14:21.902Z	DEBUG	queued task cancelled due to config change	{"task": "backup for plan \"raid-backup\""}
2024-05-01T09:14:21.902Z	INFO	reset task queue, scheduling new task set.
2024-05-01T09:14:21.902Z	INFO	scheduling task	{"task": "collect garbage", "runAt": "2024-05-01T09:14:26Z"}
2024-05-01T09:14:21.903Z	INFO	scheduling task	{"task": "backup for plan \"cubbit-backup\"", "runAt": "2024-05-06T00:00:00Z"}
2024-05-01T09:14:21.905Z	INFO	scheduling task	{"task": "backup for plan \"raid-backup\"", "runAt": "2024-05-06T01:00:00Z"}
2024-05-01T09:14:24.708Z	INFO	scheduling task	{"task": "onetime backup for plan \"cubbit-backup\"", "runAt": "2024-05-01T09:14:24Z"}
2024-05-01T09:14:24.708Z	INFO	running task	{"task": "onetime backup for plan \"cubbit-backup\""}
2024-05-01T09:14:24.709Z	INFO	Starting backup	{"plan": "cubbit-backup", "opId": 1797841081815007299}
2024-05-01T09:14:24.710Z	DEBUG	repo orchestrator starting backup	{"repo": "cubbit-test"}
2024-05-01T09:14:28.856Z	DEBUG	got snapshots for plan	{"repo": "cubbit-test", "repo": "cubbit-test", "count": 0, "plan": "cubbit-backup", "tag": "plan:cubbit-backup"}

garethgeorge · 2024-05-03T05:15:03Z

Hey, mind trying to kill backrest (or possibly it's process group if restic is what's hung) with signal 7? Syntax should be something along the lines of

ps -aux | grep backrest

to find the PID,

kill -7 <pid>

or to kill the process group it'll be the negative pid e.g.

kill -7 -pid

This should dump stack traces (which will appear in the systemd logs). I suspect restic itself is hanging, I'm curious to cross post this issue on the restic forum / get some input from the devs over there re: how this should be handled e.g. how can Backrest know if restic is hung.

garethgeorge · 2024-05-20T02:50:41Z

Closing this for lack of activity, feel free to ping if you're still hitting issues.

mattiarainieri added the bug Something isn't working label Apr 23, 2024

mattiarainieri changed the title ~~Add rclone logs~~ Backup stops working without showing any errors Apr 23, 2024

garethgeorge closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup stops working without showing any errors #243

Backup stops working without showing any errors #243

mattiarainieri commented Apr 23, 2024

garethgeorge commented Apr 23, 2024

mattiarainieri commented Apr 24, 2024

garethgeorge commented Apr 25, 2024

wirehiker commented Apr 28, 2024

mattiarainieri commented May 2, 2024

garethgeorge commented May 3, 2024

garethgeorge commented May 20, 2024

Backup stops working without showing any errors #243

Backup stops working without showing any errors #243

Comments

mattiarainieri commented Apr 23, 2024

garethgeorge commented Apr 23, 2024

mattiarainieri commented Apr 24, 2024

garethgeorge commented Apr 25, 2024

wirehiker commented Apr 28, 2024

mattiarainieri commented May 2, 2024

garethgeorge commented May 3, 2024

garethgeorge commented May 20, 2024