Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cluster): Add support for request proxying for scale out #2385

Merged
merged 2 commits into from
May 20, 2024

Conversation

vrajashkr
Copy link
Contributor

@vrajashkr vrajashkr commented Apr 12, 2024

What type of PR is this?
feature

Which issue does this PR fix:
Towards #125
Carry forward of #2041
Initial fork PR rchincha#160

What does this PR do / Why do we need it:
This PR adds support for proxying requests to other members of a zot instance cluster for scale out.
This PR only has support for shared storage cluster scale out.

Testing done on this change:

  • Functional sanity tests are integrated into BATS and run as part of CI.
  • Scale tests are integrated into BATS and can be triggered via manual workflow trigger, release build, and main branch merge.

Automation added to e2e:
N/A - no functionality change from an application point of view. The proxy is transparent to clients.

Will this break upgrades or downgrades?
No. Scale-out is new and there wouldn't be existing setups using this config yet.

Does this PR introduce any user-facing change?:
yes

Users can now specify settings for a "scale-out" mode where multiple instances of zot can run simultaneously using the same shared reliable storage for improved scale and performance in large deployments.

TODO:

  • Unit Tests for new code
  • Fix for localstack bottleneck for high scale
  • Manual testing on personal fork for the high scale workflow trigger

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

pkg/api/proxy.go Outdated Show resolved Hide resolved
pkg/api/proxy.go Outdated Show resolved Hide resolved
@rchincha
Copy link
Contributor

This turned out to be a fairly simple change.
Let's focus on making all state into "single-writer" model next.

@vrajashkr
Copy link
Contributor Author

This turned out to be a fairly simple change. Let's focus on making all state into "single-writer" model next.

Thanks for the base commit @rchincha. It set a good foundation for me to build on :)

examples/config-cluster-member0.json Outdated Show resolved Hide resolved
pkg/api/routes.go Show resolved Hide resolved
pkg/api/proxy.go Outdated Show resolved Hide resolved
pkg/api/proxy.go Fixed Show resolved Hide resolved
@luisdavim
Copy link

latter wold it be possible to leverage this to scale out in situations where a shared storage is not available by also setting up syncs between the members?

Copy link

codecov bot commented Apr 27, 2024

Codecov Report

Attention: Patch coverage is 92.30769% with 18 lines in your changes are missing coverage. Please review.

Project coverage is 92.85%. Comparing base (be5ad66) to head (645fad2).

Files Patch % Lines
pkg/api/proxy.go 92.03% 6 Missing and 3 partials ⚠️
pkg/common/common.go 86.04% 3 Missing and 3 partials ⚠️
pkg/api/controller.go 88.88% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2385      +/-   ##
==========================================
- Coverage   92.86%   92.85%   -0.02%     
==========================================
  Files         167      168       +1     
  Lines       22077    22289     +212     
==========================================
+ Hits        20502    20696     +194     
- Misses        982      993      +11     
- Partials      593      600       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vrajashkr
Copy link
Contributor Author

latter wold it be possible to leverage this to scale out in situations where a shared storage is not available by also setting up syncs between the members?

Hi @luisdavim, thank you for the question! This PR focuses on shared storage, but support for a situation without shared storage is definitely something that is being looked into.

@vrajashkr vrajashkr marked this pull request as ready for review April 27, 2024 18:11
pkg/api/config/config.go Outdated Show resolved Hide resolved
pkg/api/proxy.go Outdated Show resolved Hide resolved
@rchincha
Copy link
Contributor

Add detailed comments in your commit msg. This is a significant feature enhancement.

@vrajashkr vrajashkr force-pushed the feat/cluster-proxy branch 7 times, most recently from ea5ea86 to a47e785 Compare May 12, 2024 20:30
@rchincha
Copy link
Contributor

resp, err := httpClient.Do(fwdRequest)

resp, err := httpClient.Do(fwdRequest)

^ make sure the regex check is done on this "url" before forwarding
https://github.com/project-zot/zot/blob/main/pkg/regexp/regexp.go <- find uses of this.

@rchincha
Copy link
Contributor

"Nightly jobs / s3+dynamodb scale-out performance"
^ drop "performance"

pkg/api/controller.go Outdated Show resolved Hide resolved
pkg/api/proxy.go Outdated Show resolved Hide resolved
pkg/api/controller.go Outdated Show resolved Hide resolved
Copy link
Contributor

@andaaron andaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to compute the sockets on every request? Is there a performance impact?

pkg/api/proxy.go Show resolved Hide resolved
@rchincha
Copy link
Contributor

Do we want to compute the sockets on every request? Is there a performance impact?

Yes, we shouldn't.

@vrajashkr
Copy link
Contributor Author

Do we want to compute the sockets on every request? Is there a performance impact?

Yes, we shouldn't.

I agree, this a performance hit especially since it happens for every request. I've changed to logic to perform the socket logic at the controller startup (because we need the member data for logging). This happens only once.

When proxying, we do a lookup on the index alone (since all the zots have an identical member config) which is a quicker comparison than string compare too.

This change introduces support for shared storage backed
zot cluster scale out.

New feature
Multiple stateless zot instances can run using the same shared
storage backend where each instance looks at a specific set
of repositories based on a siphash of the repository name to improve
scale as the load is distributed across multiple instances.
For a given config, there will only be one instance that can perform
dist-spec read/write on a given repository.

What's changed?
- introduced a transparent request proxy for dist-spec endpoints based on
siphash of repository name.
- new config for scale out cluster that specifies list of
cluster members.

Signed-off-by: Vishwas Rajashekar <vrajashe@cisco.com>
@vrajashkr
Copy link
Contributor Author

resp, err := httpClient.Do(fwdRequest)

resp, err := httpClient.Do(fwdRequest)
^ make sure the regex check is done on this "url" before forwarding https://github.com/project-zot/zot/blob/main/pkg/regexp/regexp.go <- find uses of this.

TODO: checking if any extras after the specified path are also routed to the same handler.

Left a comment above on the fact that the regex is already done before routing. Doing it once more would add a penalty to every proxied request. If there is a chance that other paths are routed here, a regex would be needed.

@rchincha
Copy link
Contributor

resp, err := httpClient.Do(fwdRequest)

resp, err := httpClient.Do(fwdRequest)
^ make sure the regex check is done on this "url" before forwarding https://github.com/project-zot/zot/blob/main/pkg/regexp/regexp.go <- find uses of this.

TODO: checking if any extras after the specified path are also routed to the same handler.

Left a comment above on the fact that the regex is already done before routing. Doing it once more would add a penalty to every proxied request. If there is a chance that other paths are routed here, a regex would be needed.

If the regex is already done, I suppose since clusterProxy is a wrapper around the routes, no need to do it again.

@vrajashkr
Copy link
Contributor Author

Regarding the regex, I tried adding some query params to the API call, and it seems to have let it through:

curl -v http://localhost:9000/v2/alpine/tags/list?example=test
*   Trying 127.0.0.1:9000...
* Connected to localhost (127.0.0.1) port 9000 (#0)
> GET /v2/alpine/tags/list?example=test HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: Authorization,content-type,X-ZOT-API-CLIENT
< Access-Control-Allow-Methods: GET,OPTIONS
< Access-Control-Allow-Origin: *
< Content-Length: 35
< Content-Type: application/json
< Date: Sat, 18 May 2024 14:07:28 GMT
< 
* Connection #0 to host localhost left intact
{"name":"alpine","tags":["latest"]}

Logs:

{"level":"info","clusterMember":"127.0.0.1:9001","clusterMemberIndex":"1","module":"http","component":"session","clientIP":"127.0.0.1:51370","method":"GET","path":"/v2/alpine/tags/list?example=test","statusCode":200,"latency":"0s","bodySize":35,"headers":{"Accept":["*/*"],"Accept-Encoding":["gzip"],"User-Agent":["curl/7.81.0"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":27,"caller":"zotregistry.dev/zot/pkg/api/session.go:132","time":"2024-05-18T14:07:28.799382217Z","message":"HTTP API"}

This is no different from the usual behaviour of the application though.

Any extra path params will fail the query, as expected:

curl -v http://localhost:9000/v2/alpine/tags/list/example
*   Trying 127.0.0.1:9000...
* Connected to localhost (127.0.0.1) port 9000 (#0)
> GET /v2/alpine/tags/list/example HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 18 May 2024 14:38:38 GMT
< Content-Length: 19
< 
404 page not found
* Connection #0 to host localhost left intact

Copy link
Contributor

@rchincha rchincha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rchincha rchincha merged commit 5ae7a02 into project-zot:main May 20, 2024
33 of 36 checks passed
@vrajashkr vrajashkr deleted the feat/cluster-proxy branch May 20, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants