New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent tag lookup #3890
Concurrent tag lookup #3890
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #3890 +/- ##
==========================================
+ Coverage 56.63% 56.78% +0.15%
==========================================
Files 106 106
Lines 10674 10701 +27
==========================================
+ Hits 6045 6077 +32
+ Misses 3955 3951 -4
+ Partials 674 673 -1
☔ View full report in Codecov by Sentry. |
5d7ffa9
to
68edff0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot wrap allTags iteration into routine because it seems like
errgroup.Wait()
ends prematurely in this case resulting send to closed channel panic.
group.Wait()
returns exactly when it is supposed to: after every goroutine spawned by group.Go()
has returned. If you call group.Wait()
from one goroutine concurrently with another goroutine calling group.Go()
, you're going to have a bad time. You need to arrange to only wait for completion of the group after you have finished spawning all the tasks in the group.
Aside from the deadlock, you are using errgroup
correctly.
For your convenience, I have copied all my review comments for tagstore_test.go
from the previous PR.
registry/storage/tagstore_test.go
Outdated
@@ -121,20 +171,20 @@ func TestTagStoreUnTag(t *testing.T) { | |||
|
|||
func TestTagStoreAll(t *testing.T) { | |||
env := testTagStore(t) | |||
tagStore := env.ts | |||
TagStore := env.ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variables should start with a lowercase letter.
/// Should handle error looking up tag | ||
env.mockDriver.GetContentError = errors.New("Lookup failure") | ||
|
||
for i := 2; i < 15; i++ { | ||
err = TagStore.Tag(ctx, strconv.Itoa(i), desc0) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
} | ||
|
||
tags, err = TagStore.Lookup(ctx, desc0) | ||
if err == nil { | ||
t.Fatal("Expected error but none retrieved") | ||
} | ||
if len(tags) > 0 { | ||
t.Errorf("Expected 0 tags on an error but got %d tags", len(tags)) | ||
} | ||
|
||
// Should not error for a path not found | ||
env.mockDriver.GetContentError = storagedriver.PathNotFoundError{} | ||
|
||
tags, err = TagStore.Lookup(ctx, desc0) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
if len(tags) > 0 { | ||
t.Errorf("Expected 0 tags on path not found but got %d tags", len(tags)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use subtests, not ///
comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to lookup for a good examples first. Tests are stolen, not know much about golang tests at all. Will investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
///
is a rust
doc comment; Go
is way simpler 😄
|
||
tags, err = TagStore.Lookup(ctx, desc0) | ||
if err == nil { | ||
t.Fatal("Expected error but none retrieved") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
68edff0
to
18af738
Compare
Fair enough. First let's decide where in the config structure that value should go, as that will have an impact on where and how to add the concurrency factor to the distribution configuration. The concurrency factor, as currently implemented, sets an upper bound on the number of storage-backend operations which may be in flight concurrently to service a given If we do decide that a separate per-call limit is desirable, I think it should be defined as the upper bound on the number of storage-backend operations which may be in-flight concurrently in service of handling any one Distribution API client request or maintenance operation so that the same config value can also be applied elsewhere. It makes sense to me for that config value to be grouped under Thoughts @milosgajdos @thaJeztah @squizzi? |
993ea04
to
4ff0c0e
Compare
The thing is Ideally, we would combine both of them, whilst remaining runtime / operationally safe. I am not quite sure at the moment what that'd look like - we could go with adding per-call counter but that'd complicate the code 🤔 I need to have a think about this. Maybe @deleteriousEffect has some ideas around this, too As for the config; if we do go for the latter option i.e. symbiosis b/w global and per-call limit I'd chuck it into storage:
tag:
concurrency_limit: N # or maybe worker_limit? |
@Cohere @milosgajdos Let's summarize. What's left to do? Improve the tests and make the concurency factor configured in general manner? I would appreciate any help with this. |
@@ -134,44 +153,67 @@ func (ts *tagStore) linkedBlobStore(ctx context.Context, tag string) *linkedBlob | |||
|
|||
// Lookup recovers a list of tags which refer to this digest. When a manifest is deleted by | |||
// digest, tag entries which point to it need to be recovered to avoid dangling tags. | |||
func (ts *tagStore) Lookup(ctx context.Context, desc distribution.Descriptor) ([]string, error) { | |||
func (ts *TagStore) Lookup(ctx context.Context, desc distribution.Descriptor) ([]string, error) { | |||
allTags, err := ts.All(ctx) | |||
switch err.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err!=nil {
switch err.(type) {
....
}
}
let's define an new struct like, type tagRes struct {
|
Has there been any progress on this PR? Is there a plan for merging it? @Antiarchitect @corhere |
It's been a while since I've last worked on this MR. If I remember correctly there was something with the tests. I don't mind if someone more experienced in Go will take this initiative. |
4ff0c0e
to
31bbac8
Compare
hi @Antiarchitect, thanks for your contribution, I will take over it if you have no enough bandwidth. |
@microyahoo That would be nice. Thank you. |
31bbac8
to
b5bf809
Compare
1a284c2
to
363a5bc
Compare
Based on distribution#3589 Signed-off-by: Andrey Voronkov <voronkovaa@gmail.com>
363a5bc
to
8059345
Compare
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Closing in favor of #4329 by @microyahoo |
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
I have no access to Enapter/distribution repo anymore so I've opened a new request.
@corhere Please check if I used errgroup correctly in the last commit. I cannot wrap allTags iteration into routine because it seems like
errgroup.Wait()
ends prematurely in this case resulting send to closed channel panic.P.S. Seems I need help to move concurrency factor setup into common configuration module but it seems too complicated for me to understand how it works and what should I do.