New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GC performance] The performance of v2 manifest deletion is not good in S3 environment #12948
Comments
Hey @wy65701436, following our chat in Slack I'd like to share similar performance issue we're experiencing with a GCS storage backend. We're running Harbor v2.1.1 and we replicated a GCR registry content to Harbor, however we forgot to exclude a repo that had at the time >60,000 tags. After replication completed we deleted the repo in Harbor and ran GC, but the job keeps failing due to timeout to delete manifest:
looking at the registry logs we see that it takes over an hour to delete a manifest:
We enabled debug log in registry and we saw it was spending most of the time iterating though the tags with
for example:
I can share the full GC job and registry debug logs if needed, also happy to provide more information. |
Experiencing exactly the same issue as @dkulchinsky. We're unable to end a GC. It always end with a context deadline exceeded. We've more than 1Tb to clean (~130k objects). We can't resume so we have to start from scratch again. |
Hello friends, I'd like to ask to raise the priority of this issue. We are running several instances of Harbor (we use GCS backend, but I think the root cause here is the same) and we're rapidly growing our usage. We are starting to reach capacities that the GC simply cannot handle, repositories with more than a few thousand tags is taking ~2 minutes to delete a single manifest during GC, GC is now taking 10~14 hours and the problem is getting worse every day since we're adding more tags then we are deleting. on our test/certification Harbor instance we've reached over 20,000 tags on some repositories and GC just times out on the first manifest since the lookup takes >20 minutes. We are concerned about increasing our storage costs since we can't clean it up as well as other potential issues that may arise from having all these blobs/manifests lingering with no ability to properly clean them up. this issue was tagged as a candidate for v2.2.0, and we're already seeing v2.4.0 going out the door. I'm happy to provide additional context, information, logs but just hope we can have some attention on this issues since I think it will impact any user that needs Harbor to work at scale. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I believe this is still an active issue being tracked, so probably shouldn't get closed yet. |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
This is still an issue. |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
This is still an issue. |
@wy65701436 any hints on when the team may find some time to look at this? this seems like an issue that requires desperate attention however it didn't see any traction in over 2 years now. |
Just to further explain the crux of this issue. Harbor is using the docker distribution for it's (harbor-registry) registry component. The harbor GC will call into the docker registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest: To find the tag references, the docker registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest): So, the more tags you have in your repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale. |
@Antiarchitect, wanted to share some progress
I was able to build distribution with your PR, distribution/distribution#3702 (update GCS drive to latest) and the Redis sentinel patch. I also figured out the issue around the retry loop when a 404 is returned during manifest delete and have a fix for Harbor jobservice here: #18386 I'm running a test now with the above in our sandbox Harbor and although it's not breaking any speed records, it is looking much better, will update once I have more concrete numbers. |
When using s3 for storage, during manifest delete, s3 look-up using s3 list and get calls is performed to check all the tags referencing this manifest. For a large repository having lakhs of images, all these tags are read to delete 1 single image. Instead of relying on distribution to delete the tag folder, since Harbor already maintains the tags referencing a manifest, we can introduce a new api in distribution to delete the tag directory altogether when deleting the artifact and skip tag lookup during the garbage collection step. I was able to test this out and brought down the registry size from 360TB to less than 30TB by configuring 30 days of retention period. Checkout these prs: Have done a couple of optimizations like:
I made the changes against the version 2.7.0 code of Harbor. Now that this is working, will try to raise pr in the main harbor repo in the coming weeks. Please let me know if this helps for you. |
Hey @hemanth132, just a quick shout out that this looks very promising and thanks for your effort. Would love to see these changes make it into Harbor so we can finally run GC 😅 /cc @Vad1mo, this is related to our conversation in Slack earlier. |
@wy65701436, take a look #12948 (comment) |
Any update? |
Has anyone tested @hemanth132 solution? Or are there any updates from Harbor team regarding the API or GC? @Vad1mo @chlins Bump because it's still an important issue regarding usage with every S3 backend. |
Any update? Still causing a huge pain, GC works out slower than data is added, resulting in having to constantly extend disks |
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
hi @karmicdude, I have taken over @Antiarchitect's efforts with concurrent lookup and untag in PR distribution/distribution#4329. You can try it and check whether it has improvement, thanks. |
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
hi @karmicdude, @sebglon, @jwojnarowicz @sidewinder12s , distribution/distribution#4329 has already merged, would you please help to try whether it meets expectations. |
Nice, I'll definitely check it out |
@wy65701436 @Vad1mo Can we have this change in v2.11.1 as this will improve the GC efficiency. |
In S3 backend environment, we found that it took about 39 seconds to delete a manifest via v2 API.
[why still use v2 to handle manifest deletion]
As Harbor cannot know the tags belong to the manifest in the storage, the GC job needs to leverage the v2 API to clean them. But, the v2 API will look up all of tags, and remove them one by one. This may cause performance issue.
[what we can do next]
1, Investigate how many requests send to S3 storage within the v2 manifest deletion.
2, Investigate the possibility of not to store the first tag in the backend, then GC job can skip this step.
Log
The text was updated successfully, but these errors were encountered: