sftp backend does not reconnect #353

mika · 2015-11-16T17:38:10Z

I'm trying to back up data to Hetzner's backup space (http://wiki.hetzner.de/index.php/Backup/en). It starts nicely with ~15-20MiB/sec, but as soon as I'm receiving a Write failed: Broken pipes or Connection to uXXX.your-backup.de closed by remote host. from Hetzner's backup space then restic is permanently decreasing its performance to <1MiB/s and doesn't seem to be able to recover from it. While this clearly seems to be a problem from Hetzner's backend storage I can't find a ssh configuration (TCPKeepAlive, ServerAliveInterval,...) which would work around this problem. It would be nice if restic could recover from this situation itself.

Thanks for restic! :)

The text was updated successfully, but these errors were encountered:

mika · 2015-11-16T17:40:01Z

Sorry, forgot to add, I'm running latest git:

restic 0.1.0 (v0.1.0-145-g34bf70f)
compiled at 2015-11-16 17:34:04 with go1.3.3

on a Debian wheezy amd64 system, using the following command line:

 % ./restic -r sftp://hetzner-backup//odysseus/home/mika backup ~/

fd0 · 2015-11-16T19:11:34Z

Thanks for your report, the backend infrastructure does not yet handle connection errors. For sftp, just one persistent connection is made to the backup server, which does not reconnect. That's the underlying problem.

I suspect that after the error is received, the connection is closed and restic does not store anything at all after that. However, you should be able to resume the backup by restarting restic.

Do you know what server the Hetzner backup uses?

mika · 2015-11-16T22:08:44Z

Ah that explains it. :) Thanks for your fast reply! Please let me know if I could be of any help with testing, would love to use restic also for the Hetzner use case. :)

episource · 2015-11-17T06:49:33Z

Side note: It seems unlikely to me, that Hetzner's ssh backend is broken. However, the following observation might be valuable for you, too.

A few times I observed stalling tcp over IPv6 connections that where related to faulty routers on the way to the target host. What happend: Some router on the way had Sequence Number Randomization enabled. This should be transparent to both Server and client, however they did not do it properly, as they did not rewrite SACK (selective acknowledgement) tcp header option. Once the first packet loss occured, there was a high chance for the sack option to contain an illegal (out of expected range) value causing netfilter/conntrack to drop the response package.

IPv4 could theoretically affected, too, but I never observed this myself.

You can log packets flagged invalid by the conntrac module:
echo "255">/proc/sys/net/netfilter/nf_conntrack_log_invalid

Sack is used only, when both client and server support it. It can be disabled per host using the following netfilter roule (requires netfilter + TCPOPTSTRIP kernel support):
ip6tables -t mangle -I OUTPUT -p tcp --syn -d your.server.tld -j TCPOPTSTRIP --strip-options sack-permitted

You could also try to (temporarily) disable sack for all connections (ipv4+ipv6):
sysctl -w net.ipv4.tcp_sack=1

mika · 2015-11-17T10:05:54Z

Interesting hints, thanks. No matter whether I'm running with net.ipv4.tcp_sack=1 or net.ipv4.tcp_sack=0, this doesn't seem to change anything sadly.

I got runtime panics though, full log available in http://michael-prokop.at/tmp/restic_panic.txt jftr.

fd0 · 2015-11-17T17:14:51Z

Thanks for the log, the panic was expected. Error handling and reporting back to the user deserves a lot more work ;)

debe · 2016-09-19T18:21:36Z

I'd like to note that I receive the same error message when backing up from my linux (debian jessie) systems to hetzner backup space. Is there a workaround yet?

fd0 · 2016-09-19T18:36:05Z

Not yet, but you can just run the backup again and it will resume.

debe · 2016-11-09T08:48:26Z

~~@fd0 btw. It aborts exactly at the same position over and over again. You cannot continue.~~
I aborts quite early, so it never finishes my backup, but it's not the same position, you're right.

fd0 · 2016-11-09T09:14:58Z

Hm, I find that rather unexpected. Restic starts, reads files, splits them into chunks, bundles the chunks together to pack files and uploads the pack files to the backend. The pack files are never the same (neither size nor content), so the server always sees different data.

@debe Did you create a debug log yet? Maybe that contains another hint...

debe · 2016-11-09T11:58:43Z

ssh loglevel DEBUG3.

https://0bin.link/paste/66DtwHnN#NEDc92vKhiQANv2ESS5WJTYOcWNGe0jQe+yHpWu1Buh

fd0 · 2017-04-26T08:47:51Z

FYI: It was just reported that Hetzner is still running ProFTPD 1.3.5b. The fix should be in 1.3.6, which just released a few weeks ago on 9 April.

fd0 · 2017-09-05T08:09:06Z

A user just reported that Hetzner seems to have installed a newer version of the ftp service, so it works much better now. Can anybody else confirm this?

mika · 2017-09-08T09:02:07Z

@fd0 - jepp, did some tests today (using v0.7.1) and it looks much better \o/
Being the initial bug reporter from my PoV you can close this issue.
Thanks for restic!

mika · 2017-09-08T09:21:18Z

Well, maybe I was too fast with my reply, just had the following failure with a 103GB directory (hetzner storagebox as repository target via sftp, restic v0.7.1 locally):

# restic backup --one-file-system /srv
scan [/srv]
scanned 114020 directories, 1211537 files in 3:13
panic: Write: failed to send packet header: write |1: broken pipe557 items  0 errors  ETA 1:50:26

goroutine 97108 [running]:
restic/archiver.(*Archiver).saveChunk(0xc421c37a90, 0xc213e0, 0xc420012478, 0x8b881b4, 0x12a230, 0x751ece900000, 0xc423c38000, 0x12a230, 0x268000, 0xc421af68f0, ...)
        src/restic/archiver/archiver.go:165 +0x395
created by restic/archiver.(*Archiver).SaveFile
        src/restic/archiver/archiver.go:239 +0x3a4
restic backup --one-file-system /srv  170.64s user 31.84s system 40% cpu 8:24.19 total

It seems different from the problem I initially reported here back then though.
Rerunning it now, if you have any further suggestions please let me know. :)

mika · 2017-09-08T10:28:36Z

Same problem again, this time after ~37 minutes:

scan [/srv]
scanned 114020 directories, 1211537 files in 3:08
panic: Close: failed to send packet header: write |1: broken pipe325557 items  0 errors  ETA 58:46 

goroutine 536461 [running]:
restic/archiver.(*Archiver).saveChunk(0xc420f66230, 0xc213e0, 0xc420012478, 0x92f6b8d, 0x310175, 0x178b6059e00000, 0xc43e24a000, 0x310175, 0x3c4000, 0xc42195e9a0, ...)
        src/restic/archiver/archiver.go:165 +0x395
created by restic/archiver.(*Archiver).SaveFile
        src/restic/archiver/archiver.go:239 +0x3a4
restic backup --one-file-system /srv  1034.36s user 105.90s system 50% cpu 37:25.20 total

Running restic prune is the appropriate step now to get rid of those uploaded (though being an incomplete snapshot)m right?

fd0 · 2017-09-10T19:11:15Z

Yes, restic prune will take care of that.

mika · 2017-09-12T17:23:39Z

FTR, retried with latest restic from git (v0.7.1-183-g0f25ef94) with debug enabled, last lines from debug log:

2017/09/12 19:12:58 repository/master_index.go:39       repository.(*MasterIndex).Lookup        108502  id c79f83fb not found in any index
2017/09/12 19:12:58 repository/repository.go:170        repository.(*Repository).SaveAndEncrypt 108502  save id c79f83fb (data, 1519469 bytes)
2017/09/12 19:12:58 sftp/sftp.go:80     sftp.startClient.func2  12      ssh command exited, err signal: killed
2017/09/12 19:12:58 repository/packer_manager.go:124    repository.(*Repository).savePacker     108483  Save(<data/faaaad262a>) error: OpenFile: failed to send packet header: write |1: file already closed
2017/09/12 19:12:58 archiver/archiver.go:106    archiver.(*Archiver).Save       108483  Save(data, 6165f0e9): error OpenFile: failed to send packet header: write |1: file already closed
2017/09/12 19:12:58 archiver/archiver.go:164    archiver.(*Archiver).saveChunk  108483  Save(6165f0e9) failed: OpenFile: failed to send packet header: write |1: file already closed

(Running it from a Debian jessie system towards Hetzner's your-storagebox.de, IP 88.99.48.239 in my case)

mika · 2017-09-13T15:09:52Z

And another update of mine :) With latest restic from git and running it under ionice -c 3 nice -n 19 it seems to be way better/stable so far. (Or maybe the powerftpd version was updated on my storagebox in the meanwhile?) Will report if I should find out anything more, if there's anything further I could try/investigate please let me know. :)

LaszloHont · 2017-09-14T20:10:59Z

Is restic prune safe to use over unreliable connections?

fd0 · 2017-09-15T17:10:24Z

It is "safe" in terms of safety: It won't break the repo. But you'll have to start from the beginning and the repo will keep growing until you manage to finish one prune run. That's a safety feature (restic will only remove data once it is sure that the new copies of still referenced data are safely saved).

dragon2611 · 2019-04-27T14:34:48Z

It seems to me at least that it might be more reliable to use the rclone backend and let rclone handle the sftp until reconnect support is implemented, particularly if uploading over residential broadband.

fd0 · 2019-04-27T17:10:48Z

For your information: We've recently figured out that some sftp servers disconnect clients after a while without activity, you can prevent that with restic by adding the following settings to the ~/.ssh/config file for the server:

ServerAliveInterval 60
ServerAliveCountMax 240

See: https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#sftp

zacha81 · 2020-11-17T07:40:06Z

Hello. Is this still being worked on?

I have implemented a repo check job and from time to time it won't finish it seems because of this problem
it looks like this

read group #20 of 7086 data packs (out of total 197033 packs in 28 groups)
Load(<data/f351a2355c>, 0, 0) returned error, retrying after 552.330144ms: failed to send packet: write |1: file already closed
Load(<data/67b8cf0bdc>, 0, 0) returned error, retrying after 582.280027ms: failed to send packet: write |1: file already closed
Load(<data/670969ac2e>, 0, 0) returned error, retrying after 720.254544ms: failed to send packet: write |1: file already closed
subprocess ssh: Connection to uXXXXXX.your-storagebox.de closed by remote host.
Load(<data/f30466a2d3>, 0, 0) returned error, retrying after 468.857094ms: failed to send packet: write |1: file already closed
Load(<data/bb270bf82d>, 0, 0) returned error, retrying after 462.318748ms: failed to send packet: write |1: file already closed
Load(<data/bb270bf82d>, 0, 0) returned error, retrying after 890.117305ms: failed to send packet: write |1: file already closed
Load(<data/f30466a2d3>, 0, 0) returned error, retrying after 424.227764ms: failed to send packet: write |1: file already closed
Load(<data/f351a2355c>, 0, 0) returned error, retrying after 492.389441ms: failed to send packet: write |1: file already closed
Load(<data/67b8cf0bdc>, 0, 0) returned error, retrying after 447.727139ms: failed to send packet: write |1: file already closed
Load(<data/670969ac2e>, 0, 0) returned error, retrying after 600.683895ms: failed to send packet: write |1: file already closed

It seems it will continue spilling those error messages forever, the mentioned blob references are repeatedly shown. The problem is

it will leave an exclusive lock
it won't terminate

So I would at first notice this happens, when the next backup job is scheduled, which will notify me that the repo is still in use and I have to manually terminate the job and remove the lock. It would already be a great improvement if restic at least terminated on those errors (so I get the error notice immediately)

regards

fd0 · 2020-11-28T16:19:00Z

This is not actively being worked on as far as I know. Somebody would need to step up and implement reconnection/restarting code into the sftp backend.

johnflan · 2022-10-13T11:57:21Z

I assume this issue is still open?

I (not infrequently) get failed prunes like this from my home network to hetzner. Previously I spun up an AWS instance to run prune and it finished without difficulties.

will delete 15274 packs and rewrite 16703 packs, this frees 110.817 GiB
subprocess ssh: client_loop: send disconnect: Broken pipe
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 552.330144ms: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 720.254544ms: OpenFile: EOF
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 873.42004ms: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 703.28564ms: ssh command exited: exit status 255
Save(<lock/2c06eee37b>) returned error, retrying after 1.040217184s: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 1.335175957s: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 954.512469ms: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 1.107876242s: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 1.511079094s: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 2.027308147s: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 3.854635449s: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 4.987726727s: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 4.067955962s: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 5.015617898s: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 6.988645419s: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 8.277195667s: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 10.034154425s: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 10.163166574s: failed to send packet: write |1: file already closed
Save(<lock/2c06eee37b>) returned error, retrying after 22.663987965s: ssh command exited: exit status 255
Load(<data/c63c8807f9>, 0, 0) returned error, retrying after 13.811796615s: failed to send packet: write |1: file already closed
[17:06:00] 45.49%  7598 / 16703 packs rewritten
unable to refresh lock: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 351.593438ms: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 645.653563ms: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 1.204507436s: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 2.299204301s: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 2.007570431s: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 3.026422858s: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 7.133794867s: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 6.0363136s: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 17.496021524s: ssh command exited: exit status 255
Remove(<lock/be44e6cf56>) returned error, retrying after 23.002952479s: ssh command exited: exit status 255
failed to send packet: write |1: file already closed
github.com/pkg/sftp.sendPacket
	github.com/pkg/sftp/packet.go:135
github.com/pkg/sftp.(*conn).sendPacket
	github.com/pkg/sftp/conn.go:31
github.com/pkg/sftp.(*clientConn).dispatchRequest
	github.com/pkg/sftp/conn.go:117
github.com/pkg/sftp.(*clientConn).sendPacket
	github.com/pkg/sftp/conn.go:108
github.com/pkg/sftp.(*Client).open
	github.com/pkg/sftp/client.go:449
github.com/pkg/sftp.(*Client).Open
	github.com/pkg/sftp/client.go:437
github.com/restic/restic/internal/backend/sftp.(*SFTP).openReader
	github.com/restic/restic/internal/backend/sftp/sftp.go:320
github.com/restic/restic/internal/backend.DefaultLoad
	github.com/restic/restic/internal/backend/utils.go:49
github.com/restic/restic/internal/backend/sftp.(*SFTP).Load
	github.com/restic/restic/internal/backend/sftp/sftp.go:307
github.com/restic/restic/internal/limiter.rateLimitedBackend.Load
	github.com/restic/restic/internal/limiter/limiter_backend.go:44
github.com/restic/restic/internal/backend.(*RetryBackend).Load.func1
	github.com/restic/restic/internal/backend/backend_retry.go:79
github.com/cenkalti/backoff.RetryNotify
	github.com/cenkalti/backoff/retry.go:37
github.com/restic/restic/internal/backend.(*RetryBackend).retry
	github.com/restic/restic/internal/backend/backend_retry.go:36
github.com/restic/restic/internal/backend.(*RetryBackend).Load
	github.com/restic/restic/internal/backend/backend_retry.go:77
github.com/restic/restic/internal/cache.(*Backend).Load
	github.com/restic/restic/internal/cache/backend.go:197
github.com/restic/restic/internal/repository.DownloadAndHash
	github.com/restic/restic/internal/repository/repository.go:768
github.com/restic/restic/internal/repository.Repack
	github.com/restic/restic/internal/repository/repack.go:29
main.pruneRepository
	github.com/restic/restic/cmd/restic/cmd_prune.go:242
main.runPrune
	github.com/restic/restic/cmd/restic/cmd_prune.go:62
main.glob..func19
	github.com/restic/restic/cmd/restic/cmd_prune.go:27
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra/command.go:960
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra/command.go:897
main.main
	github.com/restic/restic/cmd/restic/main.go:98
runtime.main
	runtime/proc.go:204
runtime.goexit
	runtime/asm_amd64.s:1374
Load
github.com/restic/restic/internal/repository.DownloadAndHash
	github.com/restic/restic/internal/repository/repository.go:784
github.com/restic/restic/internal/repository.Repack
	github.com/restic/restic/internal/repository/repack.go:29
main.pruneRepository
	github.com/restic/restic/cmd/restic/cmd_prune.go:242
main.runPrune
	github.com/restic/restic/cmd/restic/cmd_prune.go:62
main.glob..func19
	github.com/restic/restic/cmd/restic/cmd_prune.go:27
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra/command.go:960
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra/command.go:897
main.main
	github.com/restic/restic/cmd/restic/main.go:98
runtime.main
	runtime/proc.go:204
runtime.goexit
	runtime/asm_amd64.s:1374
Repack
github.com/restic/restic/internal/repository.Repack
	github.com/restic/restic/internal/repository/repack.go:31
main.pruneRepository
	github.com/restic/restic/cmd/restic/cmd_prune.go:242
main.runPrune
	github.com/restic/restic/cmd/restic/cmd_prune.go:62
main.glob..func19
	github.com/restic/restic/cmd/restic/cmd_prune.go:27
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra/command.go:960
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra/command.go:897
main.main
	github.com/restic/restic/cmd/restic/main.go:98
runtime.main
	runtime/proc.go:204

restic version
restic 0.11.0 compiled with go1.15.9 on linux/amd64

SimoneLazzaris · 2022-10-21T07:54:45Z

Hello there!
Thanks for you great work.

Still having this problem with hetzner... my backups are failing after some minutes (13 mins, 16 mins... it varies).
Error is always

Save(<data/blabla>) returned error, retrying after xxxms: Write: connection list
Fatal: unable to save snapshot: ssh command exited: exit status 255

No combination of ssh config parameter is able to help me. I also noticed that plain ssh connection are dropped. I don't know if it's an Hetzner problem or a connection issue, but I believe that a simple retry loop will fix (or mask) the issue.

MichaelEischer · 2022-10-21T19:04:02Z

For me this sounds like the following SFTP issue discussed in the documentation (https://restic.readthedocs.io/en/stable/030_preparing_a_new_repo.html#sftp):

Please be aware that sftp servers close connections when no data is received by the client. This can happen when restic is processing huge amounts of unchanged data. To avoid this issue add the following lines to the client’s .ssh/config file:
ServerAliveInterval 60
ServerAliveCountMax 240

johnflan · 2022-10-21T19:46:57Z

Thanks Michael - I actually have those values already configured in my .ssh/config file

MichaelEischer · 2022-10-21T21:33:18Z

@johnflan Oh, that response was mainly intended for SimoneLazzaris.

In your case I wonder whether this might be a result of an internet reconnection? With a daily reconnection as for example for DSL, it wouldn't be too unexpected if that interrupted a command running for 17 hours.

SimoneLazzaris · 2022-10-22T08:32:21Z

@MichaelEischer I've tried that, and also disabling/enabling TCPKeepAlive, but without success. I suspect that something is wrong on Hetzner side actually, because also plain ssh connections are sometime dropped.

I managed to make it work using rclone as intermediate interface. So I've configured rclone to have a backend on hetzner and then I told restic to use it. It's been uploading for 24 hours now; sometime I get some errors but the connection is immediately reestablished and the backup can go on:

Note that before, with direct sftp, my connection was dropped every 10/20 minutes

johnflan · 2022-10-22T13:33:38Z

@johnflan Oh, that response was mainly intended for SimoneLazzaris.

In your case I wonder whether this might be a result of an internet reconnection? With a daily reconnection as for example for DSL, it wouldn't be too unexpected if that interrupted a command running for 17 hours.

No worries @MichaelEischer - I had considered that - but don't every have a problem performing the initial backup just the prune. On a superficial glance it would seem that in the case of a network drop restic should attempt to reinstate the scp stream to the remote target.

mika · 2022-10-23T09:29:10Z

@johnflan I also had tons of troubles with restic and Hetzner, what was also relevant for me was https://docs.hetzner.com/robot/dedicated-server/troubleshooting/performance-intel-i218-nic - if you're running a dedicated server at Hetzner, consider giving this a try

ibash · 2022-11-29T18:43:12Z

Took a first stab at this in a draft PR, approach is to separate out the ssh connection code from the backend and then establish a new connection when it exits. Not ready for maintainers to review yet, but if anyone wants to give feedback on the approach I'd welcome it.

hvhaugwitz · 2023-05-08T16:11:48Z

What is the state of this feature request?

ibash · 2023-05-08T17:06:52Z

My pr kind of worked, but needed cleanup + more extensive manual testing before it’d be merge ready.

Personally didn’t need this functionality that much because I made my wrapper script that calls restic just kill the process on disconnect and retry later.

jbsilva · 2023-12-27T12:47:00Z

I'm having the same problem with Hetzner.

... I made my wrapper script that calls restic just kill the process on disconnect and retry later.

@ibash, could you share that script?
I might try this before the rclone alternative.

ibash · 2023-12-27T18:21:20Z

I'm having the same problem with Hetzner.

... I made my wrapper script that calls restic just kill the process on disconnect and retry later.

@ibash, could you share that script? I might try this before the rclone alternative.

Unfortunately my backups code is kind of messy / bespoke. I have a branch where I'm working on a cleanup. At some point later I'll publish that.

But the relevant bit is in here:
https://gist.github.com/ibash/d20e320c5248cc61eaa1cb1248be4bb3

In the code below child is the restic process. Then I attach a handler to stderr and if the text returned error, retrying after and connection lost ever appears I just eagerly kill the process. This script is wrapped in a launchd process that tries to backup every few hours, so the retry will happen sometime later.

      child.stderr.on('data', (data) => {
        // unfortunately restic never recovers from this, so end early s.t. the
        // backup can be retried again sooner
        //
        // ref: https://github.com/restic/restic/issues/353
        if (data.includes('returned error, retrying after') && data.includes('connection lost')) {
          isConnectionLost = true
          reject(new Error('connection lost'))
        }
      })

Cantello · 2024-03-14T07:35:32Z

Similar problems here, apparently also a Hetzner problem (using a 1TB storage box over there) but it was working until 07Mar2024, then it broke. Some minutes into the backup (photos, ~350GB) I get the usual:

Save(<data/c2a9ae6dd9>) returned error, retrying after 552.330144ms: Write: connection lost
Save(<data/c4bf517c04>) returned error, retrying after 720.254544ms: OpenFile: connection lost
Save(<data/0634acad0f>) returned error, retrying after 582.280027ms: OpenFile: connection lost
Save(<data/87ce95b26b>) returned error, retrying after 468.857094ms: OpenFile: connection lost
Save(<data/447de67d48>) returned error, retrying after 462.318748ms: OpenFile: connection lost
subprocess ssh: Connection to u123456.your-storagebox.de closed by remote host.
Fatal: unable to save snapshot: ssh command exited: exit status 255
error in cleanup handler: ssh command exited: exit status 255

The command line is unspectacular:
restic backup --exclude /mnt/Photos/iOS_Photos --exclude-caches --password-file /etc/resticprofile/password.txt --repo sftp:123456@u123456.your-storagebox.de:restic-backup --tag photos --verbose=3 /mnt/Photos

Did someone already ask over in the Hetzner forum or their tech support? Can't be a restic issue if it was working without any changes before, right?

edit: Changed the backend to rclone and it works now, even with the whole 380GB. No idea what broke it and I only asked in Hetzner's forum, did anyone ask tech support yet?

fd0 changed the title ~~restic perfomance drastically decreases on sftp connection issues~~ sftp backend does not reconnect Nov 16, 2015

fd0 added type: bug category: backend labels Nov 16, 2015

sjoerd-ccu mentioned this issue Feb 1, 2016

backup stall after a few minutes #405

Closed

fd0 added backend: sftp and removed category: backend labels Feb 22, 2016

fd0 mentioned this issue Aug 3, 2016

Restic panics on connection reset when backing up to S3 #560

Closed

fd0 mentioned this issue Sep 4, 2016

Panic during prune command to minio S3 repo #603

Closed

fd0 mentioned this issue Nov 9, 2016

Panic when backup to SFTP #665

Closed

fd0 mentioned this issue Mar 10, 2017

Temporary restic files not cleaned up after crash or bad exit (CRITICAL) #873

Closed

fd0 mentioned this issue May 1, 2017

SFTP Upload Stops and does not recover #937

Closed

fd0 mentioned this issue May 30, 2017

Backup to sftp backend fails #983

Closed

fd0 mentioned this issue Oct 18, 2017

Retry sftp if ssh connection dies #1364

Closed

fd0 added the state: need implementing cause/request established, need work/solution label May 26, 2018

fd0 mentioned this issue May 26, 2018

restic panics if writes to the SFTP backend fail #550

Closed

deliciouslytyped mentioned this issue Sep 20, 2020

Make restore resumeable. #407

Open

fd0 mentioned this issue Nov 28, 2020

restic backup hangs forever, but only sometimes #3133

Closed

varac mentioned this issue Feb 4, 2021

Initialization failed with sftp backend djmaze/resticker#72

Closed

MichaelEischer added the help: wanted label Aug 20, 2022

ibash linked a pull request Nov 29, 2022 that will close this issue

Reconnect sftp connection #4058

Draft

11 tasks

MichaelEischer linked a pull request Jul 29, 2023 that will close this issue

Reconnect sftp connection #4058

Draft

11 tasks

sftp backend does not reconnect #353

sftp backend does not reconnect #353

Comments

mika commented Nov 16, 2015

mika commented Nov 16, 2015

fd0 commented Nov 16, 2015

mika commented Nov 16, 2015

episource commented Nov 17, 2015

mika commented Nov 17, 2015

fd0 commented Nov 17, 2015

debe commented Sep 19, 2016 • edited

fd0 commented Sep 19, 2016

debe commented Nov 9, 2016 • edited

fd0 commented Nov 9, 2016

debe commented Nov 9, 2016

fd0 commented Apr 26, 2017

fd0 commented Sep 5, 2017

mika commented Sep 8, 2017

mika commented Sep 8, 2017

mika commented Sep 8, 2017

fd0 commented Sep 10, 2017

mika commented Sep 12, 2017

mika commented Sep 13, 2017

LaszloHont commented Sep 14, 2017

fd0 commented Sep 15, 2017

dragon2611 commented Apr 27, 2019

fd0 commented Apr 27, 2019

zacha81 commented Nov 17, 2020

fd0 commented Nov 28, 2020

johnflan commented Oct 13, 2022

SimoneLazzaris commented Oct 21, 2022

MichaelEischer commented Oct 21, 2022

johnflan commented Oct 21, 2022

MichaelEischer commented Oct 21, 2022

SimoneLazzaris commented Oct 22, 2022 • edited

johnflan commented Oct 22, 2022

mika commented Oct 23, 2022

ibash commented Nov 29, 2022

hvhaugwitz commented May 8, 2023

ibash commented May 8, 2023 • edited

jbsilva commented Dec 27, 2023 • edited

ibash commented Dec 27, 2023

Cantello commented Mar 14, 2024 • edited

debe commented Sep 19, 2016 •

edited

debe commented Nov 9, 2016 •

edited

SimoneLazzaris commented Oct 22, 2022 •

edited

ibash commented May 8, 2023 •

edited

jbsilva commented Dec 27, 2023 •

edited

Cantello commented Mar 14, 2024 •

edited