Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reproducible builds #1112

Open
Foxboron opened this issue Aug 10, 2022 · 10 comments
Open

Support reproducible builds #1112

Foxboron opened this issue Aug 10, 2022 · 10 comments
Labels

Comments

@Foxboron
Copy link
Contributor

Looking into supporting mkosi for https://system-transparency.org/ but we need support for reproducible builds.

Currently mkosi support --finalize-script for image cleanups, but there are none distributed as part of mkosi project, but I'm wondering if we should contemplate having a set of cleanup scripts to help make images reproducible as part of the project? I think it could be done by having --reproducible as an alias for --finalize-script=reproducible-builds.$distro, or something along those lines.

mmdebstrap effectively has a little bit of code to just rm files which has timestamps and similar embedded.
https://gitlab.mister-muffin.de/josch/mmdebstrap/src/branch/main/mmdebstrap#L2948

Relevant issues around Reproducible Builds; #700 #687

@behrmann
Copy link
Contributor

Like the idea, but I guess an additional step in build_image, maybe just after run_finalize_script:

run_finalize_script
+if args.reproducible:
+    make_reproducible(args, root, do_run_build_script, for_cache)

for some implementation of make_reproducible would be more in line with the rest (and I think right now we only support a single finalize script).

@Foxboron
Copy link
Contributor Author

Foxboron commented Aug 10, 2022

A little bit of work and I have gotten reproducible Debian images with mkosi. A couple of hacks and some stuff that should be polished, but hopefully it inspires a bit!

I'll send a few pull-requests :)

(.venv) λ mkosi-test » sudo mkosi -d debian --finalize-script=reproducible --no-manifest -o debian.cpio.xz -t cpio --compress-output=xz build
[.....]
‣   Running finalize script…
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/dpkg.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/bootstrap.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/cache/apt/pkgcache.bin
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/history.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/term.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/alternatives.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/cache/ldconfig/aux-cache
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/eipp.log.xz
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/lib/dbus/machine-id
‣  Unmounting image…
‣  Creating archive…
‣ Linking image file…
‣  Changing ownership of output file debian.cpio.xz to user fox (acquired from sudo)…
‣  Changed ownership of debian.cpio.xz
‣ Linked debian.cpio.xz
‣ Resulting image size is 47.5M, consumes 47.5M.
(.venv) λ mkosi-test » sha256sum debian*
fb6fce4c54780cf6c0f540bb7d5a732f4536122d4b2bd45b35ce622b5b64b975  debian2.cpio.xz
fb6fce4c54780cf6c0f540bb7d5a732f4536122d4b2bd45b35ce622b5b64b975  debian.cpio.xz
diff --git a/mkosi/__init__.py b/mkosi/__init__.py
index 3f79095..6fc795a 100644
--- a/mkosi/__init__.py
+++ b/mkosi/__init__.py
@@ -3559,6 +3559,10 @@ def make_cpio(

     root_dir = root / "usr" if args.usr_only else root

+
+    reset_timestamps = ["find", root_dir, "-mindepth", "1", "-execdir", "touch", "-hcd", "@0", "{}", "+"]
+    run(reset_timestamps)
+
     with complete_step("Creating archive…"):
         f: BinaryIO = cast(BinaryIO, tempfile.NamedTemporaryFile(dir=os.path.dirname(args.output), prefix=".mkosi-"))

@@ -3573,7 +3577,7 @@ def make_cpio(
             assert cpio.stdin is not None

             with spawn(compressor, stdin=cpio.stdout, stdout=f, delay_interrupt=False):
-                for file in files:
+                for file in sorted(files):
                     cpio.stdin.write(os.fspath(file).encode("utf8") + b"\0")
                 cpio.stdin.close()
         if cpio.wait() != 0:
@@ -5119,6 +5123,7 @@ def create_parser() -> ArgumentParserMkosi:
         type=cast(Callable[[str], ManifestFormat], ManifestFormat.parse_list),
         help="Manifest Format",
     )
+    group.add_argument('--manifest', default=True, action=argparse.BooleanOptionalAction),
     group.add_argument(
         "-o", "--output",
         help="Output image path",
@@ -7445,7 +7450,9 @@ def build_stuff(args: MkosiArgs) -> Manifest:
     workspace = setup_workspace(args)

     image = BuildOutput.empty()
-    manifest = Manifest(args)
+    manifest = None
+    if args.manifest:
+        manifest = Manifest(args)

     # Make sure tmpfiles' aging doesn't interfere with our workspace
     # while we are working on it.
@@ -8137,7 +8144,8 @@ def run_verb(raw: argparse.Namespace) -> None:
         if args.auto_bump:
             bump_image_version(args)

-        save_manifest(args, manifest)
+        if args.manifest:
+            save_manifest(args, manifest)

         print_output_size(args)

diff --git a/mkosi/backend.py b/mkosi/backend.py
index 07f285e..d550ab3 100644
--- a/mkosi/backend.py
+++ b/mkosi/backend.py
@@ -436,6 +436,7 @@ class MkosiArgs:
     architecture: str
     output_format: OutputFormat
     manifest_format: List[ManifestFormat]
+    manifest: bool
     output: Path
     output_dir: Optional[Path]
     bootable: bool

@Foxboron
Copy link
Contributor Author

What should the UX for reproducing mkosi images be?

The current workflow I've implemented for the Arch images is this;

mkosi -d arch --reproducible -o arch.cpio.xz -t cpio --compress-output=xz build
mkosi -d arch --reproducible --manifest-file ./arch.cpio.xz.manifest -o arch.repro.cpio.xz -t cpio --compress-output=xz build

But would it make more sense to have a reproduce subcommand and include more information in the manifests?

mkosi -d arch --reproducible -o arch.cpio.xz -t cpio --compress-output=xz build
mkosi reproduce ./arch.cpio.xz.manifest

Is there anything else we need to take care of or think of?

@DaanDeMeyer
Copy link
Contributor

The second approach definitely makes more sense to me.

Unfortunately, we're probably going to run into the limits of our argument parsing again, since all arguments apply to all commands, which probably wouldn't make sense for a "reproduce" command.

The main thing that would need to be added to the manifest file is a serialized version of the config used to build the image.

@Foxboron
Copy link
Contributor Author

Hmm, should we try to serialize the config into the manifest, or do we assume that the manifest + configuration is what is capable of reproducing the image?

@behrmann
Copy link
Contributor

I think serialising the config would be a good approach, although I'm a bit apprehensive to just dump it to json, because MkosiArgs will most certainly still change down the line and then we might run into issues with missing or unexpected keys, when trying to recreate stuff. Some version field might be sensible here.

@keszybz Do you have input here? Since you started the manifest work you probably have thoughts where such extensions should go to.

@keszybz
Copy link
Member

keszybz commented Aug 16, 2022

I think we should do the rework discussed in #769, in a way that there's a few layers of clearly-separated config:

  • config files + command-line args
  • effective config with automatic extensions to the package lists, e.g. when we add some packages based on the selected distro, partition sizes that were selected, etc.
  • effective package list resulting from the above config (i.e. what the manifest gathers currently)

And I'd (optionally) save all three in the manifest file. If we get the abstractions right, this shouldn't be any extra work, just serialization to json of a few dicts or dataclass objects. And this would give all the information to understand what was done and how to repeat it. How this information is to be used would be chosen by the "client" that is doing the repeat build, depending on the intended scenario.

@Foxboron
Copy link
Contributor Author

Should we do the rework first, or would people be fine with me jamming a few new variables into manifest.json to get some of the basic reprobuilds goals accomplished?

We can just mark any form of reproducible builds as experimental to avoid any form of commitment on the manifest format.

@behrmann
Copy link
Contributor

I'd be in favour of that.

@DaanDeMeyer
Copy link
Contributor

DaanDeMeyer commented Aug 19, 2022

@rphibel is doing some fundamental work to make serializing the config into the manifest easier, starting by splitting MkosiArgs into MkosiConfig and MkosiState

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants