Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFTP Mirror only copies 98 files per folder for folders with >98 files #722

Open
JakeSurman opened this issue Jan 18, 2024 · 0 comments
Open

Comments

@JakeSurman
Copy link

We are using the latest stable LFTP to try to download some folders with lots of files from an SFTP server. When we do so a maximum of 98 files are being mirrored, so for folders with more files than this the rest are silently skipped. I talked to the sftp server vendor (Vendor is Arcitecta, server software is Mediaflux) about this and they have tested and provided the below info on what seems to be happening:

According to the lftp documentation, they default to using version 6 of the SSH File Transfer Protocol for SFTP (https://lftp.yar.ru/lftp-man.html):

sftp:protocol-version (number)
The protocol number to negotiate. Default is 6. The actual protocol version used
depends on the server.

The Mediaflux SFTP server only supports version 3, so this should have been negotiated. The Mediaflux sftp log shows that this happened:

[326 SFTP Packet Processor],sftp,16-Jan-2024 16:50:54.156,INFO,"SFTP SEND { Command: SSH_FXP_VERSION { negotiated-version=3 }...

The RFC for that version can be found here: https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filexfer-02

It states:

6.7 Scanning Directories

The files in a directory can be listed using the SSH_FXP_OPENDIR and
SSH_FXP_READDIR requests. Each SSH_FXP_READDIR request returns one
or more file names with full file attributes for each file. The
client should call SSH_FXP_READDIR repeatedly until it has found the
file it is looking for or until the server responds with a
SSH_FXP_STATUS message indicating an error (normally SSH_FX_EOF if
there are no more files in the directory). The client should then
close the handle using the SSH_FXP_CLOSE request.

The SSH_FXP_OPENDIR opens a directory for reading. It has the
following format:

    uint32     id 
    string     path 

where id' is the request identifier and path' is the path name of
the directory to be listed (without any trailing slash). See Section
``File Names'' for more information on file names. This will return
an error if the path does not specify a directory or if the directory
is not readable. The server will respond to this request with either
a SSH_FXP_HANDLE or a SSH_FXP_STATUS message.

Once the directory has been successfully opened, files (and 

directories) contained in it can be listed using SSH_FXP_READDIR
requests. These are of the format

    uint32     id 
    string     handle 

where id' is the request identifier, and handle' is a handle
returned by SSH_FXP_OPENDIR. (It is a protocol error to attempt to
use an ordinary file handle returned by SSH_FXP_OPEN.)

The server responds to this request with either a SSH_FXP_NAME or a
SSH_FXP_STATUS message. One or more names may be returned at a time.
Full status information is returned for each name in order to speed
up typical directory listings.

When the client no longer wishes to read more names from the
directory, it SHOULD call SSH_FXP_CLOSE for the handle. The handle
should be closed regardless of whether an error has occurred or not.

When re-running the mirror for that single directory that contained 167 files we can see that the following sequence of requests and responses related to those operations in the Mediaflux sftp log:

[320 SFTP Packet Processor],sftp,16-Jan-2024 16:50:52.842,INFO,"SFTP RECV { Command: SSH_FXP_OPENDIR { request-id=3 path=/Volumes/personal/2012/Lakes District }...
...
[320 SFTP Packet Processor],sftp,16-Jan-2024 16:50:52.855,INFO,"SFTP SEND { Command: SSH_FXP_HANDLE { request-id=3 ,handle=62296 }...
[320 SFTP Packet Processor],sftp,16-Jan-2024 16:50:52.858,INFO,"SFTP RECV { Command: SSH_FXP_READDIR { request-id=4 handle=62296 }...
[320 SFTP Packet Processor],sftp,16-Jan-2024 16:50:52.868,INFO,"SFTP SEND { Command: SSH_FXP_NAME { request-id=4, count=100, filename=/Volumes/personal/2012/Lakes District/. attr=SSH_FILEXFER_TYPE_DIRECTORY Attributes { size=169 uid=0 gid=0 permissions=drwxrwxrwx create-time=1705369014 modify-time=1705369014 entry-type=SSH_FILEXFER_TYPE_DIRECTORY entry-type=SSH_FILEXFER_TYPE_DIRECTORY},...
[320 SFTP Packet Processor],sftp,16-Jan-2024 16:50:52.871,INFO,"SFTP RECV { Command: SSH_FXP_CLOSE { request-id=5 handle=62296 }...
[320 SFTP Packet Processor],sftp,16-Jan-2024 16:50:52.876,INFO,"SFTP SEND { Command: SSH_FXP_STATUS { request-id=5 return-code=SSH_FX_OK error-message=null language-tag=en }...

As you can see:

lftp issued an SSH_FXP_OPENDIR and Mediaflux responded with an SSH_FXP_HANDLE which specified handle 62296.
lftp then issued an SSH_FXP_READDIR request using the specified handle and Mediaflux responded with an SSH_FXP_NAME containing the first 100 entries in the directory (which is made up of the "." (current directory) and ".." (parent directory) entries and 98 of the files)
lftp issued an SSH_FXP_CLOSE for the handle rather than issuing another SSH_FXP_READDIR to get the remaining entries. Mediaflux responded with and SSH_FXP_STATUS indicating SSH_FX_OK.

So instead of getting the next set of files for that folder lftp seems to be indicating that the transfer is complete, and the process continues to the next location.

Is this the expected behaviour for Mirror? I'm not the SFTP server vendor here so I'm not sure how we can get this working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant