New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: cannot allocate memory
on job with many files.
#2632
Comments
Ah. So, this doesn't sound like actual "memory" that's causing the crash per se, but it looks like where the crash occurred is in attempting to map the job plan file into memory. I notice this is a massive job. It's entirely possible that the job plan file's memory mapping simply eats through allocatable space. This is a known AzCopy issue (And something I'd like to address; but we don't really encounter transfers of this scale that often). We usually mitigate it by breaking jobs down into smaller, more manageable chunks. If you have files separated into folders, or there is some consistent naming scheme that could be filtered against, AzCopy has pattern/path filters. If there's no way to filter against names, perhaps breaking down by LMT may be another strategy with --include-before and --include-after. |
Thanks for the prompt reply @adreed-msft. As a test I tried adding an The next level in directory structure has about 1e4 directories per parent. My next approach will be to write a script that runs azcopy on each of those 10k subdirectories sequentially: the jobs would be well-sized but my concern is that the overhead of starting each job would slow things down. I could maybe include a few in each azcopy I guess there's no way to recover the plan files at this point and I'll have to run azcopy syncs to avoid copying the data twice? (It's a cross-region transfer so using the |
Try include-pattern instead of include-regex here if you'd like to use a single wildcard character. Keep in mind that regexp expects |
Does include pattern only work for filenames though? https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-files#use-wildcard-characters |
Include pattern operates on file name, yes. |
👋 Hey @alexpersin! I just finished an S3->Azure copy a few weeks ago of about half the size (~370 TB over 225 million files) and didn't run into any OOM issues, but without modifying the default concurrency values - it finished in about 54 hours. Have you had any luck running AzCopy without setting I came across your issue while looking into #2642 🙂. |
Which version of the AzCopy was used?
AzCopy 10.24.0
Which platform are you using? (ex: Windows, Mac, Linux)
Linux, Ubuntu 20.04
What command did you run?
What problem was encountered?
The job is copying ~1e9 blobs averaging 600KB in size between two azure storage accounts and failed after about 2 days with
azcopy jobs resume
then fails after less than a minute withThe VM had a lot of available memory at the time of the crash
and used no more than 8GB of memory before crashing when attempting to resume the job. The plan files total 239GB and the VM is a Standard D96d v5 (96 vcpus, 384 GiB memory).
Five other similar jobs were running at the same time on other VMs on other directories with the same setup, and all crashed after similar amounts of time.
How can we reproduce the problem in the simplest way?
Run a similarly sized job?
Have you found a mitigation/solution?
No, I am unable to resume the job.
The text was updated successfully, but these errors were encountered: