Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sysroot auto cleanup #2511

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dbnicholson
Copy link
Member

This is my proof of concept to initiate cleanup on the next boot of a staged deployment. In #2510 we discussed whether it was a good idea to add another systemd unit when the cleanup may be handled better by a management daemon like rpm-ostree.

I think this is a nice general purpose way to address the issue, but I definitely concede it's not the only way to handle it. I think even without the systemd unit the rest of the commits are useful. It would be possible to stash some state indicating cleanup from outside ostree, but really ostree is by far best positioned since it handles writing out the staged deployment via ostree admin finalize-staged.

Fixes: #2510

This is a wrapper around `ostree_sysroot_cleanup` that only performs the
cleanup when the `.cleanup` file exists. When the cleanup has completed,
the file is deleted. The purpose of this API is to allow other routines
to indicate that cleanup is needed when they're not in a good position
to initiate the cleanup themselves. This API can then be run at a more
opportune time without actually initiating the expensive cleanup
operations unless they were needed.
This provides a convenient method to run `ostree_sysroot_auto_cleanup`
rather than `ostree_sysroot_cleanup`.
Currently finalizing a staged deployment skips the pruning that
typically happens when writing out a deployment since it would block the
shutdown path. In order to leave an indication that a cleanup should be
run at a more opportune time, touch the `.cleanup` file after the staged
deployment is written out. The `ostree_sysroot_auto_cleanup` API can
then be used to perform the pruning at a better time without fear that
it will initiate an unnecessary prune.
This unit makes use of the `--auto` option of `ostree admin cleanup` to
cleanup the sysroot when the `/sysroot/.cleanup` file is left behind
when finalizing a staged deployment. The purpose of this service is to
restore pruning of deleted deployments when a staged deployment is
written out during shutdown of the previous boot. The unit is optional
as the cleanup can be run at another time or not at all.

Cleaning the sysroot will prune the repo, and this can be a slow and IO
intensive operation. To keep the system from blocking, the default
`Before` dependency on `multi-user.target` has been removed. Since the
service will then run in the background, the IO scheduling class has
been lowered to `idle` to keep the system's primary tasks more
responsive.
Comment on lines +35 to +36
ProtectSystem=strict
ReadWritePaths=/sysroot /boot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I like this. But don't we also have to touch /var and/or /etc for the .updated files? IIRC I hit that last time trying to restrict things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that happens when finalizing a deployment, not when cleaning up. See the comments in ostree-finalize-staged.service. I tested this in a downstream way and it worked fine.

Over in endlessm/eos-updater#298 @wjt pointed out that having /sysroot writable means essentially the whole system is writable. One thing that could be nicer would be to limit this whole exercise to just pruning since finalizing also does ostree_sysroot_prepare_cleanup. Then you could limit the writable paths to just /sysroot/repo.

@@ -20,12 +20,17 @@
- uncomment the include in Makefile-libostree.am
*/

LIBOSTREE_2022.2 {
global:
ostree_sysroot_auto_cleanup;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New public API feels overkill for this if the expectation is that it will be invoked by a separate systemd unit. Related below:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was with an eye toward having the systemd unit disabled and letting the management daemon handle it itself. But I did do this in endlessm/eos-updater#298 without needing the API. I do think it's cleaner to have the semantics contained within libostree, though.

GError **error)
{
struct stat stbuf;
if (!glnx_fstatat_allow_noent (self->sysroot_fd, _OSTREE_SYSROOT_AUTOCLEANUP, &stbuf, AT_SYMLINK_NOFOLLOW, error))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the systemd unit is already doing this, do we need a new API at all versus simply ExecStart=ostree admin cleanup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. I think you could do this all in the CLI or via systemd unit settings. It felt cleaner to encapsulate it in libostree, but it's not necessary. I could go either way.

@openshift-ci
Copy link

openshift-ci bot commented Mar 1, 2022

@dbnicholson: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

[Unit]
Description=OSTree Automatic Sysroot Cleanup
Documentation=man:ostree-admin-cleanup(1)
ConditionPathExists=/sysroot/.cleanup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we do something similar, but with the condition inverted:

# This is deleted by deploy.sh:
ConditionPathExists=!/sysroot/.ostree-cleaned

This made migration easier. If you're upgrading ostree with the deploy then the old version of ostree won't know to create this file, so when booting into the new deployment cleaning won't run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a nice property. However, doesn't that mean you're running a prune on every boot, even if you haven't made a new deployment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we delete .ostree-cleaned only just before ostree admin deploy.

@cgwalters cgwalters added difficulty/medium medium complexity/difficutly issue reward/medium Fixing this will be notably useful labels May 2, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jun 29, 2023

@dbnicholson: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/sanity 951d5d7 link true /test sanity
ci/prow/fcos-e2e 951d5d7 link true /test fcos-e2e
ci/prow/images 951d5d7 link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty/medium medium complexity/difficutly issue needs-rebase reward/medium Fixing this will be notably useful
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup from staged deployments
3 participants