-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Geo-rep transfers large files multiple times while using Tar+ssh #4321
Comments
@aravindavk Do you see any issue in the fix suggested? We are also going to upstream the changes he mentioned for tar. We achieved 15X speed up using the I also verified that the issue is present in the latest release. |
Removing duplicate data entries looks good. But it will not solve the duplicate issues completely. Geo-rep processes the changeling files in a loop. If the changes recorded in different batches then there is a chance that it gets picked up in the next batch processing. I think it is a good patch since it reduces the chances.
|
Correct. This is only an optimisation.
We will send a PR for this as well.
Our application doesn't produce sparse files. To get the correct fix we have to upgrade, so we went with this approach until upgrade. |
Description of problem:
I am using glusterfs 6.6 version with georeplication enabled. Geo-rep uses Tar+ssh to transfer the files from primary site to secondary site.
When large file are uploaded to primary site, it is being sent multiple time.
Example, when a 92 GB file was uploaded, it took more than 20 hours to complete. On checking this, its found that the same file got recorded in multiple changelog files, 127 changelogs to be exact, this caused the file to be sent 127 times to the secondary site, which effectively increase the datatransfer to be around 11 TB (92 GB x 127 = 11684 GB ~ 11 TB).
what if we make the changes in the master files (geo-replication/syncdaemon/master.py) something like this :
The exact command to reproduce the issue:
create a file using the below command on primary site:
dd if=/dev/zero of=/path/to/file bs=32k count=2949120
It will take some time for the file to be created, which in turn will be recorded in multiple changelogs.
Additional info:
Commands used in georeplication (snippet from geo-replication/syncdaemon/resource.py file)
At primary side:
At secondary side
Other info:
Volume Name: xxxx
Type: Distributed-Replicate
Volume ID: xxxxx
Status: Started
Snapshot Count: 0
Number of Bricks: 30 x 3 = 90
Transport-type: tcp
For geo-replication we are using cert based unprivileged user (geo-rep-user)
sync_method:tarssh
- The operating system / glusterfs version:
OS: Ubuntu Focal
Glusterfs version : 6.6
The text was updated successfully, but these errors were encountered: