Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft deleted directory isn't being calculated while calculating capacity #35646

Closed
GaneshMSFT opened this issue May 15, 2024 · 2 comments
Closed
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@GaneshMSFT
Copy link

GaneshMSFT commented May 15, 2024

  • Package Name: azure-storage-blob
  • Package Version: 12.20.0
  • Operating System: Windows 11
  • Python Version: 3.10.11

Describe the bug
When trying to calaculate container capacity for an ADLS Gen2 account, its not calculating the soft-deleted directory capacity.

For example, if you saved 100G under sa/container/folder1/folder2/100Gfile.txt
then you delete this 100Gfile.txt
SDK with include=deleted will be able to calculate the deleted size
but if you have folder2 deleted instead, then SDK, blob inventory (and Azure Storage Explorer) will not be able to tell the deleted 100G size.

To Reproduce
Steps to reproduce the behavior:

  1. Run this method

def dir_scan_deleted(writer, client) -> int:
"""
Recursively scan a ADLS

Args:
    DictWriter: csv file writer
    ContainerClient: blob client

Returns:
    total_size: size of adls container.
"""
blob_list = client.list_blobs(include="deleted")
total_size = 0
for blob in blob_list:
    total_size += blob.size
    if blob.deleted and blob.size == 0:
        continue
    writer.writerow(
        {
            "path": blob.name,
            "size_in_bytes": blob.size,
            "deleted": bool(blob.deleted),
            "deleted_time": blob.deleted_time,
        }
    )
return total_size

Expected behavior
I have expected this to include the deleted blobs inside the deleted directory but it doesn't seem to include those. But, if I delete the individual blob, my code is calculating that correctly.

Screenshots
None

Additional context
Azure blob inventory, Azure storage explorer also exhibiting the same behavior. This is documented here for Blob Inventory https://learn.microsoft.com/en-us/azure/storage/blobs/blob-inventory#reports-might-exclude-soft-deleted-blobs-in-accounts-that-have-a-hierarchical-namespace

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files) labels May 15, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jalauzon-msft @vincenttran-msft.

@jalauzon-msft
Copy link
Member

Hi @GaneshMSFT, this was discussed internally as well but updating this thread for completeness. This is unfortunately a known limitation with the Storage service. Currently soft-deleted directories are not properly returned in List Blobs. The service team is aware of this limitation and have a fix planned for later this year. Once the fix is in place and rolled out, it will promptly be added to the SDK (and likely other products like Blob Inventory).

Since this is a service issue, I'm going to close this issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

2 participants