Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 publishing speedup #1171

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

S3 publishing speedup #1171

wants to merge 2 commits into from

Conversation

cavedon
Copy link
Contributor

@cavedon cavedon commented Apr 24, 2023

Description of the Change

When publishing to an S3 bucket two prefixes can be used: a prefix in the storage configuration (in aptly.conf), and a prefix in the publish configuration (in the aptly DB).
When updating a publishing to an S3 bucket:

  • when using a prefix in the publish configuration, aptly will needlessly list the content of the whole bucket (or under the storage prefix, if configured)
  • if the number of files published in the bucket is very large, listing it may take a significant amount of time, because it is done sequentially in batches of 1000 keys.

This pull request addresses both issue.

Checklist

  • unit-test added (if change is algorithm)
  • functional test added/updated (if change is functional)
  • man page updated (if applicable)
  • bash completion updated (if applicable)
  • documentation updated
  • author name in AUTHORS

Add setting `parallelListingRequests` to an S3 publish endpoint. If set
to a value greater than 1, aptly will use up to that number of parallel
HTTP requests to list the S3 bucket.

Fix the test S3 server: in case on multiple CommonPrefixes being
returned, S3 specs require multiple CommonPrefix XML tags (and not a
single CommonPrefix with multiple Prefix tags).
When a publishing uses a publish prefix, instead of listing the contents
of the whole bucket under the storage prefix, only list the contents of
the bucket under the storage prefix and publish prefix, and cache it by
publish prefix.
This speeds up publish operations under a prefix.
@randombenj
Copy link
Member

Seems the ci is broken: #1173

@neolynx neolynx self-assigned this Jan 14, 2024
@neolynx
Copy link
Member

neolynx commented Apr 21, 2024

good idea, unfortunately aptly has moved to github.com/aws/aws-sdk-go-v2/service/s3 which makes rebasing this PR a bit difficult.

Would you mind rebasing it ?

@neolynx neolynx added the needs rebase The PR needs to be rebased on master label Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs rebase The PR needs to be rebased on master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants