Skip to content
This repository has been archived by the owner on Jun 10, 2019. It is now read-only.

Generated Docker images have a single layer #340

Open
nbraud opened this issue Sep 12, 2016 · 5 comments
Open

Generated Docker images have a single layer #340

nbraud opened this issue Sep 12, 2016 · 5 comments

Comments

@nbraud
Copy link
Contributor

nbraud commented Sep 12, 2016

Docker images constitute of several layers, filesystem overlays that are combined together to produce the actual image's filesystem. The rationale for doing this is two-fold:

  1. Multiple images can share layers (say, a bare debootstraped Debian install can be common to multiple images) to save disk space (and cache).
  2. Layers are an integral part of how Docker lets you reuse images by slapping more customizations on top.

While 2. is not relevant (I think) for bootstrap-vz, 1. very likely is, and producing reuseable image layers would be great. However, the way Docker implements this (assuming that running the same command in the same layer produces the same result) has notorious issues and is likely not applicable here.

However, something that could be done is to run the build as usual (without Docker-style caching) and switch to a different layer at the end of a few phases, such as os_installation, package_installation, system_modification and user_modification.
Provided that the installation of the same packages (in the same version and so on...) results in the same layer (in other words, provided that the layer builds reproducibly), then the image can share layers with other images that use the same debootstrap parameters, install the same packages, ...

Note that, unlike Docker's approach, this is safe: the worst that can happen is that no layer is common to several containers. The build is done in entirety, without any special assumption.

@nbraud
Copy link
Contributor Author

nbraud commented Sep 12, 2016

@andsens This is, right now, a mildly wild idea, in the sense that I didn't even check yet if very simple images build reproducibly (and if not, where the unreproducibility lies). However, I think reproducible builds would be a very powerful feature to have.

I filed the issues so that other people may read it (and that I do not forget).

@andsens
Copy link
Owner

andsens commented Sep 12, 2016

That is a great idea. I love the concept of commits between phases.
We would have to leverage dockers built-in concept of caching in order to reuse layers, which is where I think it would get tricky. Alternatively we'd need some way of uniquely identifying a specific build setup ourselves, which I think would be even harder (things would slip through the cracks all the time).

Still, this is definitely worth discussing :-)

@nbraud
Copy link
Contributor Author

nbraud commented Sep 12, 2016

We would have to leverage dockers built-in concept of caching in order to reuse layers, which is where I think it would get tricky.

Exactly my point: Docker's caching mechanism is utterly unsafe/insane.
My proposal to deal with it would be to run the entire build (without caching) but make the phases deterministic: that way, the same layers get built, get the same id (as the id is a hash of the layer) and get only stored once.

@andsens
Copy link
Owner

andsens commented Sep 15, 2016

that way, the same layers get built, get the same id (as the id is a hash of the layer) and get only stored once.

You won't get the same ID, the timestamps of files change from run to run :-(

@nbraud
Copy link
Contributor Author

nbraud commented Sep 20, 2016

@andsens Yes; as said originally, it will need work to make the phases deterministic.
I had a first look with diffoscope, and it looks pretty promising; the only differences when building a small container are relatively small:

  • A few files are non-deterministic, but can be thrown away:
    • /etc/machine-id is a random id;
    • /etc/init.d/.depend.{boot,stop} has non-deterministic order;
    • /var/log/alternatives.log has timestamps;
    • /var/cache/ldconfig/aux-cache.
  • /bin/{compress,gunzip} are for some not-yet-investigated reason different.
  • Many files (not coming from a package directly) have differing mtimes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants