-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added /prune/ endpoint to removing "old" RPMs from a Repository. #3536
base: main
Are you sure you want to change the base?
Conversation
4d8ad7c
to
edf4989
Compare
Draft until I figure out what is up w/ permissions - can't reproduce locally?!? |
Some notes on performance:Some unscientific performance results follow. The repositories are a selection of "real" repos, of small (zoo), medium (baseos-9) and large (appstream-9, epel) amounts of content. All runs used concurrency==1, oci-env, on my workstation (Intel® Core™ i7-6700K × 8), info-logging only. Repository List:name: zoo name: rhel_9_x86_64_appstream name: rhel_9_x86_64_baseos name: epel9 How "prune" was run
How timings were evaluated
Performance Runs
|
pulp_rpm/app/tasks/prune.py
Outdated
log = getLogger(__name__) | ||
|
||
|
||
def prune_repo_nevras(repo_pk, keep_days, dry_run): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prune_repo_packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make both n-to-keep
and the time threshold configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably start with simplest and always n1 to keep, we could make it configurable later if we get requests for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have retain_package_versions
if you want "how many nevras to keep", and retain_repo_versions
to prune old repositoryversions. This is "get rid of nevras older than X" regardless of those settings. Do we really want to make this even more complicated? Even if someone somewhere thinks it would be nifty, I think we're already at the edge of "too many choices".
staging_docs/user/guides/06-prune.md
Outdated
The `keep_days` argument allows the user to specify the number of days to allow "old" content to remain in the | ||
repository. The default is 14 days. | ||
|
||
The `repo_concurrency` argument allows the user to control how many `pulpcore-workers` can be operating on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is correct to expose this decision to the user of the repos, if he decided to set it to 50 it will do no good for the rest of pulp installation. Should we rather take same approach as with workers on import process? It would rather be a setting that is admin set/controlled and maybe having it again in percentage is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already a ton of ways for a single user to stuff up the dispatch process. I can, right now, create 1000 repos all pointing to the same remote, and then sync all of them in an endless loop.
I considered (and even at one point implemented) a similar percentage-setting. The prob w/ a setting is that it requires restarting all the workers to pick it up if you decide that you want something different, and can't be changed for just a specific run.
I also toyed with making this an admin-only command. That would answer the immediate COPR need. But our other future-looking service-needs "feel" like it needs to be accessible to end-users.
I think the "a single user can consume All The Workers" is a very valid issue - but it needs to be fixed at a more-foundational level than in individual calls. I'd rather see that addressed as its own feature for the entire system.
# Validate that they are for RpmRepositories. | ||
hrefs_to_return = [] | ||
for href in value: | ||
hrefs_to_return.append(NamedModelViewSet.get_resource(href, RpmRepository)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case there was provided no-rpm href get_resource will fail, should we catch-ignore-it or rather fail the validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, hm. I have to refresh my memory on how get_resource() behaves if it can't find the resource. In any event - my take is to fail the call. But it would be "more polite" to check all the provided HREFs, deptermine which ones aren't RPM-repos, and fail with a message specifying all the incorrect repos, so the user can clean up their usage. WDYT?
pulp_rpm/app/tasks/prune.py
Outdated
log = getLogger(__name__) | ||
|
||
|
||
def prune_repo_nevras(repo_pk, keep_days, dry_run): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably start with simplest and always n1 to keep, we could make it configurable later if we get requests for it
closes #2909.