Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow SFTP transfer rate since 0.12.1 #4209

Closed
DeadWalkingDeath opened this issue Feb 15, 2023 · 22 comments · Fixed by #4782
Closed

Slow SFTP transfer rate since 0.12.1 #4209

DeadWalkingDeath opened this issue Feb 15, 2023 · 22 comments · Fixed by #4782
Labels
backend: sftp category: optimization help: good first issue easy issue to get started with for beginners (both Go and restic) help: minor complexity less complex issue, good for beginners that want a little challenge yet not too complicated help: wanted type: feature enhancement improving existing features

Comments

@DeadWalkingDeath
Copy link

DeadWalkingDeath commented Feb 15, 2023

I'm trying to use restic to backup ~3TB of data to another system using SFTP over a gigabit network connection.

I've set RESTIC_REPOSITORY and RESTIC_PASSWORT environment variables and used the following commands:
restic init
restic backup /mnt/disk

When I tried the latest release (0.15.1), I immediately noticed the slow transfer speed of around 10 MB/s. The cpu load was 60% idle most of the time.
First I tried a lot of different options:

  • Disable backup progress estimation (--no-scan)
  • Increasing backend connections (-o sftp.connections=10)
  • Disable compression (--compression off)
  • Increasing file read concurrency (--read-concurrency 8)
  • Increasing pack size (--pack-size 64)
    None of these parameters did change anything.

Next I tried an older version from the Debian repository (0.11.0).
Just using this version caused my cpu to get maxed out and I reached about 60-70 MB/s transfer rate.

Then I tried every release from 0.11.0 to 0.15.1 and starting with 0.12.1 every version afterwards shows this issue.
I guess it must have something to do with a change that happened in 0.12.1.

It does only happen with SFTP repositories. A backup to a local repository is not affected.
I've also tried to init the repository with 0.12.0 (repository format version 1) and then doing the backup with the latest release. Also slow transfer.

Any ideas?

@DeadWalkingDeath DeadWalkingDeath changed the title Slow sftp transfer rate since 0.12.1 Slow SFTP transfer rate since 0.12.1 Feb 15, 2023
@greatroar
Copy link
Contributor

greatroar commented Feb 17, 2023

The main change in 0.12.0 regarding SFTP performance is #3211. Could be that that inferes with something in your setup.

Sorry, 0.12.1. That has #3331, a new version of the SFTP library that we use, and #3345, a stability improvement. Maybe the former is the issue, but I can't really be sure.

@DeadWalkingDeath
Copy link
Author

I can confirm that the issue is #3331.
I did a build from revision e678aca, where the new SFTP library was added. With this build I get the reduced speed.
Then I got a revision before that (c9b4fad). With that build the transfer rate is a lot higher.

@MichaelEischer
Copy link
Member

Judging from the release notes of pkg/sftp version 1.13.0 this change is the culprit:

[https://github.com/pkg/sftp/pull/397] Refactoring and performance improvements, implement WriterAt interface. Concurrent writes are not longer enabled by default for safety reasons. You can use the option UseConcurrentWrites if it's needed for your use case.

The description of UseConcurrentWrites sounds like it should be safe to use, as we completely retry the upload if there is an error while writing a file. Anyone up for a PR?

@MichaelEischer MichaelEischer added type: feature enhancement improving existing features help: wanted help: minor complexity less complex issue, good for beginners that want a little challenge yet not too complicated category: optimization backend: sftp help: good first issue easy issue to get started with for beginners (both Go and restic) labels Feb 19, 2023
@DeadWalkingDeath
Copy link
Author

I've given the following changes a try, but it doesn't have any impact on transfer speed.
Maybe I'm missing something?

diff --git a/internal/backend/sftp/sftp.go b/internal/backend/sftp/sftp.go
index 514dd58da..d61afc5cb 100644
--- a/internal/backend/sftp/sftp.go
+++ b/internal/backend/sftp/sftp.go
@@ -97,8 +97,12 @@ func startClient(cfg Config) (*SFTP, error) {
                }
        }()

+       opts := []sftp.ClientOption{
+               sftp.UseConcurrentWrites(true),
+       }
+
        // open the SFTP session
-       client, err := sftp.NewClientPipe(rd, wr)
+       client, err := sftp.NewClientPipe(rd, wr, opts...)
        if err != nil {
                return nil, errors.Errorf("unable to start the sftp session, error: %v", err)
        }

@greatroar
Copy link
Contributor

@DeadWalkingDeath I'd spell it

client, err := sftp.NewClientPipe(rd, wr, sftp.UseConcurrentWrites(true))

but yes, that should be it. UseConcurrentWrites actually only takes effect when using ReadFrom instead of Write, but it looks like the restic SFTP backend is already doing that...

@AlBundy33
Copy link

does this maybe help?

git checkout https://github.com/pkg/sftp.git
cd sftp

create test.sh with this content

start=`date +%s` && go run examples/buffered-write-benchmark/main.go -user test -pass 123 && t=$((`date +%s` - $start)) && test $t -lt 7

remove host, user and pass to use localhost and maybe change the 7 to the seconds that works on your system
and now let bisect do the job

root@91b2acc00772:/go/sftp# git bisect start HEAD v1.10.0
Bisecting: 220 revisions left to test after this (roughly 8 steps)                                [a66e205b29f898d57b85b6f9e3a6e3ac5ad58cd4] chan is already available via closure
root@91b2acc00772:/go/sftp# git bisect run ./test.sh
running ./test.sh
2023/03/21 22:38:44 writing 1e+09 bytes
2023/03/21 22:38:48 wrote 1e+09 bytes in 4.540004105s
Bisecting: 109 revisions left to test after this (roughly 7 steps)
[ba123434b87007b835fef17bb0e95ebc28d6a21a] Merge pull request #437 from drakkan/realpath
running ./test.sh
2023/03/21 22:38:50 writing 1e+09 bytes
2023/03/21 22:39:00 wrote 1e+09 bytes in 10.06397129s
Bisecting: 55 revisions left to test after this (roughly 6 steps)
[16ca7e303229c17f462e5c1aaaf6734c8aaf9761] Merge pull request #428 from drakkan/readfrom
running ./test.sh
2023/03/21 22:39:02 writing 1e+09 bytes
2023/03/21 22:39:12 wrote 1e+09 bytes in 10.033959944s
Bisecting: 26 revisions left to test after this (roughly 5 steps)
[e164c76a1254f2d3e747327c7c7a061a1f7539f2] Merge pull request #413 from greatroar/mkdirall-slash
running ./test.sh
2023/03/21 22:39:13 writing 1e+09 bytes
2023/03/21 22:39:23 wrote 1e+09 bytes in 10.026752012s
Bisecting: 13 revisions left to test after this (roughly 4 steps)
[b8102da57e756b46f2da530d9d16c868dd6c0e9d] Merge pull request #397 from pkg/more-optimization
 #397 from pkg/more-optimization
running ./test.sh
2023/03/21 22:39:24 writing 1e+09 bytes
2023/03/21 22:39:34 wrote 1e+09 bytes in 9.892910505s        Bisecting: 6 revisions left to test after this (roughly 3 steps)                                                          [29c556e3a6f049a6932bdaf6c9a084dc4b0f08c4] WriteTo benchmarksrunning ./test.sh
2023/03/21 22:39:35 writing 1e+09 bytes
2023/03/21 22:39:39 wrote 1e+09 bytes in 4.604390606s        Bisecting: 3 revisions left to test after this (roughly 2 steps)                                                          [fc15699691212023/03/21 22:39:41 writing 1e+09 bytes
2023/03/21 22:39:51 wrote 1e+09 bytes in 9.780662301s
Bisecting: 0 revisions left to test after this (roughly 1 step)
[0be6950c0e91d5cb73a4d690270d3a5010ac9808] fix comments and variable names to reflect Write instead of Read                                                                                         running ./test.sh
2023/03/21 22:39:52 writing 1e+09 bytes
2023/03/21 22:39:57 wrote 1e+09 bytes in 4.609922288s
fc156996912168806d33b55794e2f5436fae2c3d is the first bad commit                                  commit fc156996912168806d33b55794e2f5436fae2c3d
Author: Cassondra Foesch <puellanivis@gmail.com>
Date:   Sun Feb 21 20:32:09 2021 +0000

    fixed concurrent writes, mid-polish

 client.go | 614 ++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 373 insertions(+), 241 deletions(-)
bisect run success

I tried this in a docker container and it seems that after this version writes are slower.

@AlBundy33
Copy link

have also a look at pkg/sftp#426 where the commit is also mentioned.
according to the issue there was something fixed in 1.13.1 but restic already uses 1.13.5 🤔

@crimsonhawk47
Copy link

Any workaround for this? Restic is taking x10 longer than borgbackup for me, right now

@DeadWalkingDeath
Copy link
Author

I'm now using the REST server as backend, which is performing well.
If you're stuck with SFTP, the workaround is to use a version < 0.12.1.

@SimJoSt
Copy link
Contributor

SimJoSt commented Dec 13, 2023

To improve speed, I switched to using rclone with an SFTP backend. I felt it gave me greater control over the options and felt more optimized.
Did you try, if that resolves the performance issue?

@SigHunter
Copy link

SigHunter commented Jan 21, 2024

Hi, I was diagnosing my slow backup speeds even though I have 10g network and transfer is from local nvme to nas with ssds.

# restic options
sftp.connections    set a limit for the number of concurrent connections (default: 5)

=> default should be 5, which I interpret as 5 ssh connections to the target server, which usually show up as separate process

I only see 1 ssh connection on target. Is this what this bug is about and what MaxConcurrentRequestsPerFile will probably fix?

root@nas 14:34:56 ~ # ps aux | grep sshd | grep restic
root     1526753  0.0  0.0  11832  9120 ?        Ss   11:50   0:00 sshd: resticbackup [priv]
resticb+ 1526765 34.8  0.0  12256  6456 ?        S    11:50  57:25 sshd: resticbackup@notty

at the time of writing, I'm using latest restic 0.16.3.

(I'm asking this directly which may sound stupid because the wording in "MaxConcurrentRequestsPerFile" makes it sound like multiple concurrent transfers to the same file but what is actually needed is multiple sftp connections to multiple files)

@MichaelEischer
Copy link
Member

The names here are somewhat misleading. sftp.connections currently only means that restic uploads up to 5 files in parallel, the SFTP backend currently multiplexes all requests onto a single connection.

With SFTP files are requested in 32KB chunks, which is why multiple concurrent requests have to happen for a file to ensure a decent performance. The issue here is likely related to that.

@SigHunter
Copy link

The names here are somewhat misleading. sftp.connections currently only means that restic uploads up to 5 files in parallel, the SFTP backend currently multiplexes all requests onto a single connection.

With SFTP files are requested in 32KB chunks, which is why multiple concurrent requests have to happen for a file to ensure a decent performance. The issue here is likely related to that.

thanks for clearing that up @MichaelEischer

@MichaelEischer
Copy link
Member

There's now #4782 that fixes the upload performance. Please give it a try.

@SigHunter
Copy link

SigHunter commented Apr 30, 2024

I tried to do before and after comparisons (0.16.4 and 0.16.4+patch) but somehow my performance was already adequate without the patch on vanilla 0.16.4. (around 250 MB/s ssd to ssd over 10gbe with read-concurrency 10, sftp-connections 10, Gentoo Linux). Different to what I witnessed some months ago. I need to test some more and better structure the tests, just quickly compiled it in and looked at network graph. I hope somebody else can test the MR too.
On target, there was always just one sftp and sshd process. Is this still expected with the patch?
in any case, thanks for implementing!

@MichaelEischer
Copy link
Member

The performance problems in this issue only show up when connecting to a remote server. In a LAN setting there was never much of a problem.

On target, there was always just one sftp and sshd process. Is this still expected with the patch?

This a different question from that discussed in this issue and therefore my PR doesn't change anything in that regard.

@yverry
Copy link

yverry commented May 6, 2024

I've tested this patch on my side, that change nothing on a single file of 1GB. File transfer is very slow

@MichaelEischer
Copy link
Member

@yverry Can you provide a few more details on your test setup? What does "very slow" mean? How fast are you able to upload to that server using for example scp? How fast is the upload using restic 0.12.0?

@yverry
Copy link

yverry commented May 7, 2024

scp on qnap is slow, that is not new for me. I stuck at 20Mb/s
on restic it's more like 1Mb/s

I need more time to test with 0.12.0 because it's not compatible with version 2 format

@MichaelEischer
Copy link
Member

@yverry What exactly did you test? Which command, etc.? What is the output of restic version with and without my bugfix PR? How high is the latency between the host running restic and the NAS?

@MichaelEischer
Copy link
Member

I'm pretty sure that my bugfix PR fixes the issue introduced in restic 0.12.1; at least it did during my tests. If you still see very slow uploads/download please open a separate issue.

@yverry
Copy link

yverry commented May 12, 2024

Hi,

My test was to upload a single file of 512MB via sftp.
The first 115MB went well, after that's stuck at pretty 0 during few second, upload continue for 20MB, stuck again etc etc ...

your version, just compiled:

 ./restic-patch version
restic 0.16.4-dev (compiled manually) compiled with go1.22.2 on linux/amd64

the backup command:

time ./restic-patch backup -r sftp:restic@myserver:path --tag slowsftp ./512M -v
open repository
repository 09103b42 opened (version 2, compression level auto)
no parent snapshot found, will read all files
load index files
[0:01] 100.00%  82 / 82 index files loaded
start scan on [512M]
start backup on [512M]
scan finished in 1.141s: 1 files, 512.000 MiB

Files:           1 new,     0 changed,     0 unmodified
Dirs:            3 new,     0 changed,     0 unmodified
Data Blobs:    350 new
Tree Blobs:      4 new
Added to the repository: 512.024 MiB (512.065 MiB stored)

processed 1 files, 512.000 MiB in 8:00
snapshot de831b42 saved

real	8m2.470s
user	1m12.533s
sys	0m7.744s

A classical sftp of this one file upload went well (35sec).
With a s3 (minio) backend, everything went well too 31s

time ./restic-patch backup -r =s3:https://mys3/mysuperbucket ./512M --tag slowsftp
repository 42 opened (version 2, compression level auto)
no parent snapshot found, will read all files
[0:01] 100.00%  8 / 8 index files loaded

Files:           1 new,     0 changed,     0 unmodified
Dirs:            3 new,     0 changed,     0 unmodified
Added to the repository: 512.024 MiB (512.054 MiB stored)

processed 1 files, 512.000 MiB in 0:31
snapshot 4242 saved

real	0m32.507s
user	1m17.382s
sys	0m5.115s

So I switched to S3 without understanding where the bottleneck was.
I hope I'm more precise now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: sftp category: optimization help: good first issue easy issue to get started with for beginners (both Go and restic) help: minor complexity less complex issue, good for beginners that want a little challenge yet not too complicated help: wanted type: feature enhancement improving existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants