New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AzCopy downloads with --from-to BlobPipe fails when downloading multiple files #2575
Comments
When you download multiple files or directory to a pipe, what is your expectation. AzCopy downloads files in parallel and writing contents of multiple files to a single pipe does not make sense. |
Streaming multiple files in a single pipe can absolutely make sense. My expectation is similar output as if I download all the files and then Use case is streaming through compressed csv data larger than disk/memory available the machine and calculating metrics/stats. azcopy cp "https://ACCOUNT.blob.core.windows.net/CONTAINER/*?SAS" --include-pattern="*.csv.gz" . && \
cat *.csv.gz | \
mlr --gzin --csv cut -f "Fields,I,want" | ... # Do some fancy metrics calculation |
This will not work because AzCopy does not download file in any particular order, rather all are downloaded in parallel in small chunks (blocks). As and when a block arrives it's sent to output file (pipe in your case). This means data of the files being downloaded will be intermixed and not in an order where you can expect one file in full before another begins. This is against the AzCopy logic of downloading blocks in parallel and hence can not be honored. |
AzCopy does the right thing with just one file, pipes the data of the file in order before the whole file is downloaded. Work around below that might be useful for someone else if BlobPipe for multiple files is not implemented in AzCopy. export AZ_BASE_URL=https://ACCOUNT.blob.core.windows.net/CONTAINER/PATH/
export AZ_SAS='...'
azcopy list "${AZ_BASE_URL}?${AZ_SAS}" | grep -oP 'INFO: \K[^;]+' | grep .csv.gz | while read -r f
do azcopy cp "${AZ_BASE_URL}${f}?${AZ_SAS}" --from-to BlobPipe
done | pv | # Do streaming processing of multiple files here... Probably breaking for blobs with ';' in the name thou, bit surprised that |
Which version of the AzCopy was used?
azcopy version 10.23.0
Which platform are you using? (ex: Windows, Mac, Linux)
Linux x86-64
What command did you run?
What problem was encountered?
I expect azcopy to be able to download multiple files to a BlobPipe, it does not work.
How can we reproduce the problem in the simplest way?
Try to download multiple files from a storage account with
--from-to BlobPipe
Have you found a mitigation/solution?
Not yet, probably possible to do it in multiple steps, list files first and then start multiple azcopy processes..
The text was updated successfully, but these errors were encountered: