Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral invocations #166

Open
davidchisnall opened this issue Aug 3, 2021 · 18 comments
Open

Ephemeral invocations #166

davidchisnall opened this issue Aug 3, 2021 · 18 comments
Labels

Comments

@davidchisnall
Copy link

Is your feature request related to a problem? Please describe.
I want to run jobs in a throw-away jail that is reset to a previous state on exit. In most container systems, this is accomplished with an ephemeral layer over the top of a container image.

Describe potential alternatives or workaround you've considered (if any)

I currently wrap the pot invocation in a loop the rolls back to the previous snapshot each time. This isn't great for three reasons:

  • I want to be able to upgrade the immutable base image periodically and I can't do that while the jail is running.
  • Rolling back is a synchronous operation and so I can't restart the jail until it's finished, whereas destroying a clone can happen concurrently with taking a new clone of the same base FS.
  • I have to be really careful to make sure I do the rollback in all possible failure modes of the script.

Describe the feature you'd like to have

  • [] An ephemeral variant of pot start that cloned the filesystem, ran from the clone, and destroyed it at the end. If these clones live in a fixed part of the zpool namespace then pot can clean them up easily at the end.
  • [] A pot rename command so that I can atomically replace the immutable base image when I upgrade.
  • [] A mechanism to specify the zfs quota property for the ephemeral filesystem.
@grembo
Copy link
Collaborator

grembo commented Aug 3, 2021

While not being packaged in a feature, you can do something like this already, which should improve the situation:

# create base pot
pot create -p immutable -t single -b 13.0

# snapshot and clone
pot snapshot -p immutable
pot clone -P immutable -p mutable

# start derived pot
pot start mutable

# change stuff in immutable
echo "Changed some things" >/opt/pot/jails/foundation/m/blabla

# resnapshot and create new clone
pot snapshot -p immutable
pot clone -P immutable -p mutable_new

# stop old clone and move new clone into place
pot stop -p mutable
pot rename -p mutable -n mutable_old
pot rename -p mutable_new -n mutable
pot start mutable

# destroy old clone
pot destroy -p mutable_old

@davidchisnall
Copy link
Author

Thanks, that sounds like it's enough for what I need. I missed the clone and rename commands.

@davidchisnall
Copy link
Author

I've now done this. It would be nice to have an atomic destructive rename that pot protected from concurrent clones, but I can work around this by wrapping the rename in a script that I run while holding the same lock file that I hold while doing the clone.

@davidchisnall
Copy link
Author

This actually doesn't do quite what I need, because the cloned invocation is linked to the original and so I can't replace the base one without stopping the running ones (which I don't want to do, I want them to gracefully exit).

A pot promote would do what I need, I believe. I can hack around this by doing the promote myself, but now I'm hard-coding pot's use of ZFS.

@davidchisnall davidchisnall reopened this Aug 5, 2021
@grembo
Copy link
Collaborator

grembo commented Aug 5, 2021

This actually doesn't do quite what I need, because the cloned invocation is linked to the original and so I can't replace > the base one without stopping the running ones (which I don't want to do, I want them to gracefully exit).

Is this a way of saying "I want to be able to rename a running pot"?

@grembo
Copy link
Collaborator

grembo commented Aug 5, 2021

p.s. Can't you simply use clone + unique jail names (e.g., using uuids)? That's what the nomad plugin does when invoking "pot prepare".

@davidchisnall
Copy link
Author

Is this a way of saying "I want to be able to rename a running pot"?

That might work.

p.s. Can't you simply use clone + unique jail names (e.g., using uuids)? That's what the nomad plugin does when invoking "pot prepare".

I was using UUIDs, but now it's some extra metadata I need to communicate and I can't use well-known names of the pots to check if they're running, send them signals, and so on.

The UUID doesn't actually help here though. If I clone pot A to pot A-{UUID}, then I can't destroy pot A because the cloned dataset of A-{UUID} is dependent on A. I can fix that with an explicit zfs promote, but now I'm manipulating pot-owned ZFS datasets underneath pot, which doesn't sound like a good idea.

@grembo
Copy link
Collaborator

grembo commented Aug 5, 2021

But why would you want to destroy pot A? You can simply change/update/whatever in it and then do a new snapshot you can clone a new pot from (while keeping the old snapshot and running clones in place). Managing metadata is an extra burden for sure (but also not that hard). It’s all a bit theoretical without knowing more about what you’re actually trying to achieve.

@davidchisnall
Copy link
Author

But why would you want to destroy pot A?

Because it's no longer required. To make things more concrete:

  1. I create a pot containing a configured GitHub Actions runner and all of the dependencies for the tested code.
  2. I create an ephemeral clone of this runner
  3. It runs a single action, leaving it in a state where it's full of junk I want to throw away.
  4. I delete the ephemeral clone and loop from step 2.

At the same time, I create a new base pot containing updated versions of compilers and things, and an updated base system with security vulnerabilities fixed. I want this to be picked up by the ephemeral pot as soon as it finishes running one job (I also prod it to exit if it's in the long-poll state and not currently running anything).

As soon as the new base image is ready and the runner has finished, the base dataset is no longer required and should be deleted. If the ephemeral pot's dataset is promoted, this is trivial (ZFS handles the reference counting of any blocks that are still referenced by both).

@grembo
Copy link
Collaborator

grembo commented Aug 5, 2021

I would simply run a prune script for that :), but maybe @pizzamig has more inspiration/ideas?

@pizzamig
Copy link
Collaborator

Hi everyone, sorry, I'm a bit late.

If I understood it correctly, we have:

  • a base pot, used as base to create/run ephemeral pot
  • ephemeral pot, used as runner instance

Do you have one ephemeral per base or multiple ephemeral per base?

My observations:

  • you don't need to re-create the base pot from scratch all the times, you can just run local upgrade and take a new snapshot
  • the ephemeral pot is created cloning a snapshot. When a new base pot is available, you can simply create a new ephemeral pot cloning the new snapshot of the base pot
  • if you only have on instance of the ephemeral clone, you can take a snapshot before to run it for the first time and rollback, instead of destroy the pot

pot snapshot uses the UNIX epoch as snapshot
pot purge-snapshots can help to remove old snapshots
I can add a feature to clone (pot clone -s latest) to use the oldest/latest available snapshot, to avoid the external management of snapshot tags.

@davidchisnall
Copy link
Author

Do you have one ephemeral per base or multiple ephemeral per base?

I have a single ephemeral one, other uses cases would want multiple ones.

you don't need to re-create the base pot from scratch all the times, you can just run local upgrade and take a new snapshot

That's definitely what I'd have done 10-20 years ago but it's definitely not recommended practice for modern operations. Container deployments are supposed to be deterministically created from a declarative recipe, not continually evolving.

the ephemeral pot is created cloning a snapshot. When a new base pot is available, you can simply create a new ephemeral pot cloning the new snapshot of the base pot

Yup, that's what I'm doing now, but I need to run a zfs promote on the underlying dataset, which means I need to rely on implementation details of pot.

if you only have on instance of the ephemeral clone, you can take a snapshot before to run it for the first time and rollback, instead of destroy the pot

That's what I was doing but rollback is a synchronous operation whereas destroying a clone can happen in the background.

@pizzamig
Copy link
Collaborator

I don't understand the need to run a zfs promote (I guess you want to promote the origin to the new snapshot)
Why the re-cloning of the ephemeral pot using the new snapshot is not enough? What am I missing?

@pizzamig
Copy link
Collaborator

I've just installed and successfully start a runner using with your scripts.
Now I understand your use case with upgrade:

  • deterministically create a new base (with sufiix -tmp)
  • zfs promote the ephemeral pot to remove its dependency from the old base snapshot
  • destroy the old base
  • rename the new base without the suffix -tmp
  • gracefully shutdown the ephemeral pot and the run-actions-runner with automatically recreate it

In other words, you would need a way to recreate the base (with the same name), without shutting down the ephemeral pot.

the zfs promote solution, however, could be complicated with multiple ephemeral pot (you would need to destroy the promoted ephemeral pot as the last one)

@pizzamig
Copy link
Collaborator

It seems that you can rename the pot base while the ephemeral pot is running (the zfs origin is updated accordingly).

So the upgrade process could be:

  • rename the base with suffix -old
  • deterministically create a new base (no suffix)
  • gracefully shutdown the ephemeral pot (and let run-action-runner recreate it using the new base
  • destroy the -old base

I will test it the entire process later this week (maybe submitting a PR to your project), but the pot rename -p base -n base-old seemed to work

@davidchisnall
Copy link
Author

Thanks. The zfs promote thing prevents the cloned dataset being marked as a child of the original, which allows the original to be deleted without needing to synchronise with the running invocation. I'd rather avoid any serialisation here - CI jobs can be up to 6 hours with the standard GitHub policy, having the rename operation block for 6 hours would not be great.

@grembo
Copy link
Collaborator

grembo commented Dec 15, 2022

Hi @davidchisnall, do you think it would make sense to revisit this requirement? (we made quite some progress structurally this year, so we might be in a better position to implement the feature now).

@davidchisnall
Copy link
Author

Now that there's support for OCI containers on FreeBSD, I plan on moving my things over to that, so feel free to close this if no one else needs it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants