Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As an administrator I want to set prune policy based on datetime #2909

Open
Tracked by #2533
xsuchy opened this issue Dec 5, 2022 · 12 comments · May be fixed by #3536
Open
Tracked by #2533

As an administrator I want to set prune policy based on datetime #2909

xsuchy opened this issue Dec 5, 2022 · 12 comments · May be fixed by #3536
Assignees
Labels
COPR Features desired by the COPR Team Feature

Comments

@xsuchy
Copy link

xsuchy commented Dec 5, 2022

I, as an administrator, want to specify that if the repository has more builds of the same name, then I want to delete versions older than X days.

The Copr team is considering using Pulp as a backend repository. This is one of the few missing features in Pulp.

In Copr (Fedora's build system) we produce 500-1000 GB of build daily. It can quickly consume all available storage. Therefore we implemented a pruning policy
https://docs.pagure.org/copr.copr/user_documentation.html#how-long-do-you-keep-the-builds

We keep one build for each package in one project indefinitely. All other builds (old packages, failed builds) are deleted after 14 days.

Note that we keep the build with the greatest EPOCH:NAME-VERSION-RELEASE, even though that build might not be the newest one. Also, if there are two builds of the same package version, it is undefined which one is going to be kept.

For this, we use the tool prunerepo that we created https://pagure.io/prunerepo/tree/main
and additional logic in our backend:

It is nice when the quota of X days can be configured by the admin.

@ipanova
Copy link
Member

ipanova commented Dec 6, 2022

this is rpm plugin specific, moving

@ggainey
Copy link
Contributor

ggainey commented Dec 6, 2022

This actually belongs in pulpcore, that's where retain_versions lives now. Discussion from triage 6-DEC:

ggainey: retain_repo_versions is a base-repository-attr, feels like this should be as well?
ggainey: (COPR only cares about RPMs, but this feels like it could apply to anything)
dkliban: agreed
bmbouter: I think it could apply to anything
I'd like to see it be pulpcore
personally
gerrod: Am I missing the definition of prune? Does it mean repo-version pruning or reclaim space pruning
dkliban: gerrod: repo-version pruning
ggainey: gerrod: COPR prunes to reclaim space, to make this work w/ Pulp3 what they'd want is to able to retain-versions-by-age instead of just a set number
dkliban: gerrod: the workflow this user is interested in relies on uploads of content
so you can't reclaim disk on thiose
ggainey: e.g., "keep versions newer than 14 days"
gerrod: Oh, so it techincally is possible with automated scripts, but they want it as a feature of pulp?
ggainey: believe so, yes
dkliban: yes

@dralley
Copy link
Contributor

dralley commented Dec 6, 2022

@ggainey I think it belongs on the rpm plugin moreso, if the aim is to have a workflow similar to what they currently have. The ask here is effectively to remove packages based on date, and have those packages be cleaned up to get back their space savings. If you just set the repository version limit to 1 (latest only) then you get something more or less equivalent to using createrepo tools on disk, but the pruning functionality would still be missing, and that aspect probably has to be implemented in the plugin.

Moreover we also need to consider the implications of how this would overlap with the existing retain package count functionality.

Keeping versions by date is also a potentially useful feature independently, but I'm not sure it's necessary or directly relevant to what COPR is asking for.

@ggainey
Copy link
Contributor

ggainey commented Dec 6, 2022

Comments from matrix where @dralley unconfused me:

ggainey: I have no idea - I thought I knew what was being asked for here, but now I think I don't
is it "I have a repo-version with foo-1.1.1, -1.1.2, and -1.1.3 until -1.1.1 is > 14 days old, at that point I want a new repo-version that contains only -1.1.2 and -1.1.3" ?
so repos get new versions that slowly lose content w/ the same name, until there's only a most-recent-nevra left?

dralley: that's what I thought the ask was, but I could be wrong

ggainey: can you add this discussion to the issue then? because it (I think) makes things clearer
(at least it does for me, maybe I was the only person confused tho :) )
so this, then, would be an extension not of retain_repo_versions, but retain_package_versions for RpmRepository
(and by 'extension" I mean we would implement a "retain_package_by_age", that would work like retain_package_versions except be driven by "timestamp added to repo")
hm, no, not quite - because this can't be decided "at sync", it has to be more like orphan-cleanup that runs periodically and cleans up repos that are just quietly sitting there

TL;DR: this is a pulp_rpm feature.

@dralley
Copy link
Contributor

dralley commented Dec 6, 2022

@xsuchy @ipanova Can you confirm or clarify this?

From the COPR docs:

We keep one build for each package in one project indefinitely. All other builds (old packages, failed builds) are deleted after 14 days.

I read this as "we keeping multiple versions of RPMs in the repo (but only one build per version), and after 14 days those older versions are pruned by regenerating the repo and deleting the old RPMs, so dnf downgrade will work up until the point they are pruned."

Or, is it the case that only one version of any RPM is kept in the repo at any given time, dnf downgrade cannot see them, but the package files are available to be downloaded directly?

@xsuchy
Copy link
Author

xsuchy commented Dec 6, 2022

ggainey: I have no idea - I thought I knew what was being asked for here, but now I think I don't
is it "I have a repo-version with foo-1.1.1, -1.1.2, and -1.1.3 until -1.1.1 is > 14 days old, at that point I want a new repo-version that contains only -1.1.2 and -1.1.3" ?
so repos get new versions that slowly lose content w/ the same name, until there's only a most-recent-nevra left?

Correct.

I read this as "we keeping multiple versions of RPMs in the repo (but only one build per version), and after 14 days those older versions are pruned by regenerating the repo and deleting the old RPMs, so dnf downgrade will work up until the point they are pruned."

We have implemented this logic now. But we can easily live with the former one. Users of Copr are mostly developers and CI systems. I believe they actually do not need the ability to downgrade. But they want to be able to download the build and build artifacts (like logs) and compare it to different version-releases.

@ggainey ggainey transferred this issue from pulp/pulpcore Dec 6, 2022
@dralley dralley added the COPR Features desired by the COPR Team label Feb 10, 2023
@Conan-Kudo
Copy link

For a general use of Pulp, particularly in non-COPR deployments, having the ability for an admin to configure allowing downgrades is useful. @dralley's interpretation would be useful in a generic sense for Pulp.

@praiskup
Copy link

The way @dralley interprets this is exactly how Copr pruner works now, and it is definitely a better option.

@ipanova
Copy link
Member

ipanova commented Oct 26, 2023

Got some clarification from @praiskup today:

  1. The goal is to to remove everything but the last NVR, minus those that are not older than N days. Meaning that this does not necessarily need to be the build time but how long the content has been in pulp. We will use pulp_created field for this.
  2. There will be global endpoint, similar to reclaim_space one that will accept list of repos to prune and configurable keep_days option
    POST /pulp/api/v3/repositories/rpm/rpm/prune/ keep_days=14 repo_list =[a,b,c,] or POST pulp/api/v3/repositories/rpm/rpm/prune/ keep_days=14 repo_list =[*].

It was decided against the setting on each repo(similar to retain_package_versions we already have) because the hook will be called at the finalize_new_version meaning that it will be called everytime when a new package will be uploaded.
Edit: the setting on the repo could still reside, in case there is a uescase to have specific keep_days values on each repo. This setting could be fetched by prune task instead of the keep_days directly provided to it.
It is better to have a separate global ( to avoid having X api calls per repo) endpoint that can be called as frequent as desired via a cron job or some other scheduler.

@ipanova ipanova self-assigned this Oct 27, 2023
@ggainey
Copy link
Contributor

ggainey commented Nov 30, 2023

See discussion notes from 2023-10-26 rpm team mtg here : https://hackmd.io/@pulp/rpm_meeting#October-26

@ggainey ggainey assigned ggainey and unassigned ipanova Nov 30, 2023
@xsuchy
Copy link
Author

xsuchy commented Dec 1, 2023

q1: compatibility with retain_package_versions. Retain package_version will be called after the task is complete as the last step in the finalize_new_version.

Do I read it correctly that you want to run prune tasks as a trigger after each task? That would be a performance problem. In Copr we have a project that builds thousands builds per day. Occasionally, ten thousand per day. Times architecture and platform. Calling the prune task thousands of times per day per one repository would be a huge waste of energy. It is sufficient if the running task runs once per day.

@ggainey
Copy link
Contributor

ggainey commented Dec 1, 2023

q1: compatibility with retain_package_versions. Retain package_version will be called after the task is complete as the last step in the finalize_new_version.

Do I read it correctly that you want to run prune tasks as a trigger after each task?

No, altho that was mentioned in an early part of the discussion. @ipanova 's summary in #2909 (comment) is the current proposal, which matches yours precisely.

The hackmd link was added just to keep all the notes/discussion easily find-able.

ggainey added a commit to ggainey/pulp_rpm that referenced this issue Apr 30, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 1, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 1, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 3, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 3, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 7, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 14, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 14, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 14, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 14, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 14, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue May 15, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue Jun 3, 2024
ggainey added a commit to ggainey/pulp_rpm that referenced this issue Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
COPR Features desired by the COPR Team Feature
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

7 participants