Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for initoverlay #3066

Open
cgwalters opened this issue Oct 3, 2023 · 10 comments
Open

Add support for initoverlay #3066

cgwalters opened this issue Oct 3, 2023 · 10 comments
Assignees
Labels
area/prepare-root Issue relates to ostree-prepare-root area/sysroot Issues related to OstreeSysroot

Comments

@cgwalters
Copy link
Member

Splitting this from #2867 (comment)
which was inspired by #2867 (comment)

Basically a major flaw with initramfs (whether baked into the kernel binary or separate) is that it's not lazy - the entire thing must be decompressed and parsed before it executes at all.

In this model, we aim to shrink the role of the initramfs to be very small - just enough to mount e.g. /boot or /efi which contains...a directory tree or an erofs blob say. The role of the initramfs is to load this image, verify its integrity, and then switch root to it, before finally switching to the real root.

@cgwalters cgwalters added area/sysroot Issues related to OstreeSysroot area/prepare-root Issue relates to ostree-prepare-root labels Oct 3, 2023
@cgwalters
Copy link
Member Author

This would be a new thing that would need to live alongside the kernel data (like initramfs and devicetree).

@ericcurtin
Copy link
Collaborator

ericcurtin commented Oct 3, 2023

In the UEFI case you end up with tuples of kernel, dtb, initramfs, initoverlayfs etc. in /boot :

/boot/initramfs-0-rescue-c974db5ab26042cbabdaf72bf9433dc7.img: regular file, no read permission
/boot/initoverlayfs-6.4.15-100.fc37.x86_64.img:                EROFS filesystem, compat: SB_CHKSUM MTIME, blocksize=12, exslots=0, uuid=D34116F7-7068-3D4A-9EDC-3FFE23733519
/boot/initramfs-6.4.15-100.fc37.x86_64.img:                     regular file, no read permission

@cgwalters
Copy link
Member Author

That's a non-ostree image you're generating though right in that example? I'd expect to see /boot/ostree otherwise.

@cgwalters
Copy link
Member Author

On the ostree side I guess I'd vote that we just grab /usr/lib/ostree-boot/initoverlay.img to start; we don't care what's in it, we just add it into the kernel-lifecycled data we already have and copy/reflink-if-possible it in to /boot/ostree.

How are you thinking signatures on this would work? I think my strawman proposal here is that we aim to use composefs for this as well.

@cgwalters
Copy link
Member Author

BTW one major value of this isn't just boot speed - it means that in the case of e.g. just a kernel security update (and the userspace didn't change) we don't need to ship a whole new initramfs image, which is a really not-small improvement.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Oct 3, 2023

That's a non-ostree image you're generating though right in that example? I'd expect to see /boot/ostree otherwise.

Yes it was, for a start the goals will be to make this work for the following variants:

  • ostree UEFI/grub/etc.
  • non-ostree UEFI/grub/etc.
  • ostree aboot
  • non-ostree aboot

It was simpler to do a PoC without ostree to start, you need to run dracut twice for example, etc. and extract one of the initramfs's to an erofs file and this all hasn't been coded in ostree yet.

But it does "just work" when I booted it in Fedora, systemd and all the related tools just behave like they would in a normal initramfs except they are running in an initoverlayfs. And that was "fat" Fedora workstation with all the UI stuff, etc.

On the ostree side I guess I'd vote that we just grab /usr/lib/ostree-boot/initoverlay.img to start; we don't care what's in it, we just add it into the kernel-lifecycled data we already have and copy/reflink-if-possible it in to /boot/ostree.

How are you thinking signatures on this would work? I think my strawman proposal here is that we aim to use composefs for this as well.

I did briefly discuss this with @alexlarsson , I was even thinking of changing the name of the project to "initcomposefs" at one point and take advantage of combined engineering efforts in the composefs community. But we came to the conclusion composefs isn't worth it, all you have to do if fs-verity a single initoverlayfs file on boot and we don't care about metadata and such, so simple fs-verity might make more sense (or dm-verity if you want to put the erofs in a partition, but files are nicer to work with).

BTW one major value of this isn't just boot speed - it means that in the case of e.g. just a kernel security update (and the userspace didn't change) we don't need to ship a whole new initramfs image, which is a really not-small improvement.

Yup there's a few advantages. I have noticed from reading around in the last few days the idea of multiple initramfs's in a single boot isn't a new idea (it's present in some Android implementations to split GKI stuff from other stuff), but this implementation in my biased opinion is superior from a perf and scale perspective, because we use erofs + overlayfs alternatively. And often they have the problem where you have to choose a certain initramfs based on your needs. There's no choosing required here, because you can make initoverlayfs as small or large as you want and it performs the same.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Oct 3, 2023

One explicit goal I like to communicate though is we start an init process called storage-init that exec's systemd. The goal is not to replace systemd. systemd, etc. should still do all the heavy lifting including all the ostree stuff. storage-init's only role is to initialize storage switch to initoverlayfs and exec systemd.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Dec 7, 2023

Spoke to @dustymabe about the /boot partition size problem a week ago. Up to now I have been putting the initoverlayfs in the same directory as kernel + initramfs as it logically makes sense, as kernel + initramfs + initoverlayfs are all in the same tuple.

But note... initoverlayfs can be in any mountable filesystem, it doesn't necessarily have to be /boot and it can be configurable. But we have to take encryption scenarios, etc. into account and everything that is not a storage driver should probably be in initoverlayfs.

I also think the default for partitions like /boot should be 1G these days on new installs and configured down to something smaller for the use cases that need to save every megabyte, storage is cheap (but this is just my 2 cents, don't want to go down this rabbithole on this last paragraph on /boot partition size 😄 )

@ericcurtin ericcurtin self-assigned this Feb 11, 2024
@ericcurtin
Copy link
Collaborator

Related MR coreos/rpm-ostree#4721

We currently run a script called initoverlayfs-install whenever dracut is called to build an initrd.

@cgwalters
Copy link
Member Author

We have threads on this in a few places, but taking this bit here: It seems like a direction initoverlayfs is taking is to effectively become a dracut wrapper. That approach broadly makes sense to me.

Today, ostree owns updating the kernel+initramfs (+devicetree), and it seems like initoverlayfs here would just become yet another file. Pretty straightforward.

The thing here is though (related to containers/initoverlayfs#69 ) is that in the ostree model, the kernel/initramfs are in /usr/lib/modules/$kver in the container/commit, but at deployment time are copied into /boot/ostree. (However, if /boot is the same filesystem as / then this is just a hardlink, so it's free)

However, the ultimate filename includes a sha256 digest (to do deduplication). This is going to be tricky to deal with...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prepare-root Issue relates to ostree-prepare-root area/sysroot Issues related to OstreeSysroot
Projects
None yet
Development

No branches or pull requests

2 participants