New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AzCopy Sync do not provide 'include-path' parameter. #2594
Comments
Hi @leelax22 ! This is on our radar and we will update this thread accordingly! |
I found solution via 'include-regex' parameter in my situation. I made 2 jobs which had these parameters. But I checked log for the job and found that each job scanned someting for whole folders(0~999). |
"azcopy copy --include-path" log do not have scanning log whole folders which is not in "include-path". so I think, 'exclude' parameter contains process of first scanning whole folders and after that exclude not satisfy condition. But, after testing some jobs more, It's difference from 'sync' and 'copy', not 'exclude' and 'include', I guess. So, I wonder if I can solve the problem of my situation. each Job's file count is less than 10 millions. I don't know it could be solved easily. @@ |
This is the log from [azcopy sync --include-regex "^[0-4]?[0-9]?[0-9]/."] 2024/02/29 02:08:13 ==> REQUEST/RESPONSE (Try=1/45.956ms, OpTime=86.4368ms) -- RESPONSE SUCCESSFULLY RECEIVED LocalDisk01/500 is not included in regex but some api job has done accroding to [f24fd023-bf16-1a4f-4a36-8de234e5e7e5-scanning.log]. Even if I run the job separately, I am concerned that the scanning time will take too long if the entire folder is scanned twice. |
Which version of the AzCopy was used?
azcopy version 10.23.0
Which platform are you using? (ex: Windows, Mac, Linux)
Linux
What command did you run?
azcopy sync --exclude-path
azcopy copy --include-path
What problem was encountered?
I used azcopy for syncing azure files(region1) to azure files(region2).
Azure files support GRS except for test-failover function.
So, for training situation, we use azcopy for azure files to sync.
The problem is, there is too many files from many folders.
Azcopy recommend 1 job for under 10 million files.
So I decided to divide folders into jobs.
For example, in azure files source, there is folder 1,2,3,4 and each folder has 10 million files.
First, I used command
[azcopy sync "src" "dst" --exclude-path 3;4;
azcopy sync "src" "dst" --exclude-path 1;2;]
But it ran differently I expected. Azcopy scan all of the folder(1,2,3,4) and after that, I guess, sync process is done.
There is so many files that scanning folder 3,4 is time-wasting.
I found there is --include-path parameter in "azcopy copy" command.
[azcopy copy "src" "dst" --include-path 1;2;
azcopy copy "src" "dst" --include-path 3;4;]
Unlike azcopy sync, azcopy copy --include-path do not have process scanning all of the folder.
I wonder why azcopy sync do not excluding search processing even if they are exclude-path folders.
I hope azcopy sync have '--include-path' parameter too, or '--exclude-path' parameter skip processing of every folders.
Here is test example.
PS C:\Users\Zenuser\Desktop\azcopy> azcopy sync "https://newjeans.file.core.windows.net/newjeans/edms/LocalDisk01/?sv" "https://newjeans2.file.core.windows.net/newjeans2/edms/LocalDisk01/?sv" --delete-destination=true --exclude-path="500;501;502"
exclude-path parameter well applied.
2024/02/28 08:08:44 ==> REQUEST/RESPONSE (Try=1/26.4232ms, OpTime=56.933ms) -- RESPONSE SUCCESSFULLY RECEIVED
HEAD https://newjeans.file.core.windows.net/newjeans/edms%2FLocalDisk01/500/file2.zip?se=2024-04-17T16%3A02%3A01Z&sig=-REDACTED-&sp=rwdlc&spr=https&srt=sco&ss=f&st=2024-02-28T08%3A02%3A01Z&sv=2022-11-02
X-Ms-Request-Id: [406692d6-f01a-0014-6f1d-6ac308000000]
2024/02/28 08:08:44 ==> REQUEST/RESPONSE (Try=1/25.647ms, OpTime=40.164ms) -- RESPONSE SUCCESSFULLY RECEIVED
HEAD https://newjeans.file.core.windows.net/newjeans/edms%2FLocalDisk01/500/file4.zip?se=2024-04-17T16%3A02%3A01Z&sig=-REDACTED-&sp=rwdlc&spr=https&srt=sco&ss=f&st=2024-02-28T08%3A02%3A01Z&sv=2022-11-02
but in log file there are logs, seems like scanning exclude-path folders.
Thank you for watching. I am using azcopy well, and it would be better if this point were also improved. Or, if there is something I missed, please let me know. I would really appreciate it.
The text was updated successfully, but these errors were encountered: