Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfomance file transfer #485

Open
Holger-Benz opened this issue Apr 16, 2024 · 12 comments
Open

Perfomance file transfer #485

Holger-Benz opened this issue Apr 16, 2024 · 12 comments

Comments

@Holger-Benz
Copy link

Version

2.12.1

Bug description

Dear apache support team,

we are switching our communication software from the JSCHED sftp library to the apache-mina library.

We realized that the apache mina library does not reach the performance of the JSCHED library.

I have written a small program to send a 700 MB file using the SFTP protocol.

This file transfer is about 6 times slower than a file transfer with the JSCHED library.

How can we increase the transfer speed?

Are we not using the apache-mina library correctly?

public static void sendFile() throws IOException {
SshClient client = SshClient.setUpDefaultClient();
client.start();
try (ClientSession session = client.connect("user", "host", 1022).verify().getClientSession();) {
session.addPasswordIdentity("password");
session.auth().verify();
SftpClient sftpClient = SftpClientFactory.instance().createSftpClient(session);
String largeFile = "c:/temp/largeFile";
long length = new File(largeFile).length();
try (FileChannel writeableChannel = sftpClient.openRemoteFileChannel("largeFile",
SftpClient.OpenMode.Create, SftpClient.OpenMode.Truncate, SftpClient.OpenMode.Write);
FileChannel readableChannel = FileChannel.open(new File(largeFile).toPath(),
StandardOpenOption.READ)) {
readableChannel.transferTo(0, length, writeableChannel);

        }
    }
}

Actual behavior

The apache mina library does not reach the performance of other sftp-libraries

Expected behavior

Is it possible to increase the perfomance?

Relevant log output

No response

Other information

No response

@tomaswolf
Copy link
Member

Thank you for this test case. It appears that there is indeed something wrong with the FileChannels. The following is in my tests much faster (and on par with OpenSSH or Jsch):

SftpClient sftpClient = SftpClientFactory.instance().createSftpClient(session);
try (OutputStream out = sftpClient.write("largeFile")) {
    Files.copy(new File(largeFile).toPath(), out);
}

or also

try (SftpFileSystem fs = SftpClientFactory.instance().createSftpFileSystem(session)) {
  Path remoteFile = fs.getPath("largeFile");
  Files.copy(new File(largeFile).toPath(), remoteFile, StandardCopyOption.REPLACE_EXISTING);
}

With the channels and transferTo I see uploads (to localhost, so no network latency) about 4 times (400%) slower, and downloads about 25% slower. We'll have to investigate what's going on there...

What is the JSCHED library?

@tomaswolf tomaswolf added the bug An issue describing a bug in the code label Apr 16, 2024
@tomaswolf
Copy link
Member

Interesting: if you change in your code

readableChannel.transferTo(0, length, writeableChannel);

to

writeableChannel.transferFrom(readableChannel, 0, length);

it will also run much faster (but still 25% slower than the two versions with Files.copy() I posted).

Off-topic note: you should probably also check the return value of transferTo/transferFrom and execute them in a loop until everything is transferred.

@tomaswolf
Copy link
Member

After some analysis, here's what's going on:

transferTo/transferFrom, as well as the FileChannel.write() operations, are positional operations. readableChannel.transferTo(0, length, writeableChannel) will essentially read 8kB ByteBuffers from the file and then call writeableChannel.write() for each buffer.

However, SftpRemotePathChannel.write() doesn't know that it is being called essentially for a sequential copy operation, and so it doesn't employ a number of optimizations. The result is the slow transfer.

If you change the logic and use writeableChannel.transferFrom(), then the SftpRemotePathChannel drives the operation, and it knows that it is going to sequentially read buffers. Hence it can employ these optimizations.

When you use OutputStream/InputStream as in my Files.copy() examples, then it is known that a sequential data transfer occurs, and the SFTP implementation can employ its optimizations unconditionally.

Finally, transferTo/transferFrom by default copy data in 8kB chunks. With streams, the chunks are about 32kB. This difference causes the 25% slowdown.

Hence:

  • In general, using streams is the simplest for downloading and uploading files, and gives good performance.
  • If you want to use FileChannels:
    • Always let the remote channel drive the operation. Use transferFrom for uploading and transferTo for downloading.
    • Execute transferTo/From in a loop until all data has been transferred.
    • Increase the transfer buffer size via SftpModuleProperties.COPY_BUF_SIZE.set(session, 32 * 1024);

It might be possible to improve our implementation to handle the case you stumbled upon better, but I'm not sure yet.

@tomaswolf tomaswolf removed the bug An issue describing a bug in the code label Apr 16, 2024
@kvlnkarthik
Copy link

kvlnkarthik commented Apr 17, 2024

We see same issue in our tests as well. We are using 2.12.1 version.

We executed filetransfer test case using Files.copy() approach for transferring a file of about 167Mb to a remote server and it took around 30seconds.

If we transfer the same file from same system to the same remote server but with sftp session created with below commands, it takes around 6 minutes to complete the transfer. Performance is very much degraded in this scenario.

" sftp -P 2022 @localhost"
put file /tmp/

We run the SSHD server with sftp subsystem and a custom FileSystemFactory which creates a remote sftp filesystem. Remote sftp filesystem is created using below code.

URI sftpUri = SftpFileSystemProvider.createFileSystemURI(sshConnectionDetails.getHostname(), sshConnectionDetails.getSshPort(), sshConnectionDetails.getUsername(), sshConnectionDetails.getPassword());

Apache Mina code runs on localhost 2022 but creates a remote filesystem. So, when we execute the put file /tmp/, the file gets transferred from our local system to remote server. i.e., client -> apache mina server -> remote server.
We acknowledge that there is an additional hop here, i.e., the file needs to be transferred to server and then to remote server but the transfer rate is way too slow.

We see SftpRemotePathChannel.write method invocations during this mode of transfer in the thread dump. Based on our tests and your explanation in previous comments, this mode of transfer seems to be very slow.

Stack trace:

"sshd-SftpSubsystem-47114-thread-1" #35 daemon prio=5 os_prio=0 tid=0x00007f7cf4070800 nid=0x18e88 in Object.wait() [0x00007f7d30ffc000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at org.apache.sshd.sftp.client.impl.DefaultSftpClient.receive(DefaultSftpClient.java:351)
- locked <0x000000071c95faa0> (a java.util.HashMap)
at org.apache.sshd.sftp.client.impl.DefaultSftpClient.receive(DefaultSftpClient.java:325)
at org.apache.sshd.sftp.client.impl.AbstractSftpClient.response(AbstractSftpClient.java:181)
at org.apache.sshd.sftp.client.impl.AbstractSftpClient.rpc(AbstractSftpClient.java:169)
at org.apache.sshd.sftp.client.impl.AbstractSftpClient.checkCommandStatus(AbstractSftpClient.java:233)
at org.apache.sshd.sftp.client.impl.AbstractSftpClient.write(AbstractSftpClient.java:783)
at org.apache.sshd.sftp.client.fs.SftpFileSystem$Wrapper.write(SftpFileSystem.java:422)
at org.apache.sshd.sftp.client.impl.SftpRemotePathChannel.doWrite(SftpRemotePathChannel.java:266)
- locked <0x000000071cd40f08> (a java.lang.Object)
at org.apache.sshd.sftp.client.impl.SftpRemotePathChannel.write(SftpRemotePathChannel.java:202)
at org.apache.sshd.sftp.server.FileHandle.write(FileHandle.java:161)
at org.apache.sshd.sftp.server.SftpSubsystem.doWrite(SftpSubsystem.java:884)
at org.apache.sshd.sftp.server.AbstractSftpSubsystemHelper.doWrite(AbstractSftpSubsystemHelper.java:605)
at org.apache.sshd.sftp.server.AbstractSftpSubsystemHelper.doProcess(AbstractSftpSubsystemHelper.java:362)
at org.apache.sshd.sftp.server.SftpSubsystem.doProcess(SftpSubsystem.java:355)
at org.apache.sshd.sftp.server.AbstractSftpSubsystemHelper.process(AbstractSftpSubsystemHelper.java:344)
at org.apache.sshd.sftp.server.SftpSubsystem.run(SftpSubsystem.java:331)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Is there any way to force/override the file upload/download in sftp sessions through put/get commands to use Files.copy() way in order to see better performance or tune buffers in SftpRemotePathChannel. Could you please let us know.

Also observed that if we simply use sftp openssh client to the remote server directly without going through our Apache Mina Sftp server code, it takes only 3seconds to transfer the same file.

@Holger-Benz
Copy link
Author

Holger-Benz commented Apr 18, 2024 via email

@kvlnkarthik
Copy link

@tomaswolf ,
Any thoughts on my above comment especially on
"""
Is there any way to force/override the file upload/download in sftp sessions through put/get commands to use Files.copy() way in order to see better performance or tune buffers in SftpRemotePathChannel. Could you please let me know.
"""
Thanks,
Karthik

@Holger-Benz
Copy link
Author

The SSHD-server is also integrated in our communication software.

After updating the server from version 2.4.0 to 2.12.1, the communication of the server has become significantly slower (factor 2).

Is there any way to improve the perfomance?

@tomaswolf
Copy link
Member

The SSHD-server is also integrated in our communication software.

The original report was about the client side. Whatever this may be, it would be a new separate issue. But unless you have more information we can't do anything anyway. Best bet to track it down might be to run with debug logging, once against the old version and once against the new version. Maybe that gives some hints. Also monitor resource consumption (memory etc) on the server side in both cases, and look for differences.

@tomaswolf
Copy link
Member

Any thoughts on my above comment especially on """ Is there any way to force/override the file upload/download in sftp sessions through put/get commands to use Files.copy() way in order to see better performance or tune buffers in SftpRemotePathChannel. Could you please let me know. """

I don't think so. If I understand it right, your problem is in a server acting as a kind of SFTP proxy. That intermediary server does not see put/get commands, it only sees positional write/read requests.

@Holger-Benz
Copy link
Author

I'm sorry, you're right. We will open a new issue when we have the relevant debug data.

@benz-ppi
Copy link

Even with the changes you have suggested, transferring files with the SFTP Apache client software is significantly slower (> factor 3) than transferring files with jsched or winscp.

Do you intend to improve the performance of the SFTP Apache client software?

@tomaswolf
Copy link
Member

So far I have not enough information to do anything. I have run my own speed tests, and I see no performance problem. Before I can do anything I need to be able to reproduce the problem that you observe.

I would need detailed information about your setup: your client-side code, your test setup, what authentication mechanisms and ciphers are used, what's the size of the files, what Java version do you use, what hardware is your client running on, what server are you testing against and on what hardware or virtual machine or container is it running, what is the network latency, what buffer sizes are used, which of the I/O back-ends in Apache MINA SSHD are you using (NIO2, MINA, Netty?), and what is that "jsched" client that you keep mentioning? I have never heard of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants