Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linuxkit/build: implement --modified-only option #4038

Closed
wants to merge 1 commit into from

Conversation

rouming
Copy link

@rouming rouming commented May 2, 2024

This implements --modified-only option, which should complement recent --input-tar feature by generating resulting tarball with modified only files.

- A picture of a cute animal (not mandatory but encouraged)

Sure thing. Sad Hamster himself:

This implements --modified-only option, which should complement recent
--input-tar feature by generating resulting tarball with modified only
files.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
@deitch
Copy link
Collaborator

deitch commented May 2, 2024

I don't understand what this does. --input-tar <in.tar> uses in.tar as a source. For all unchanged packages, just copy over the files from there, rather than going back to the cache. The main purpose of that is that it is much faster to copy files as is, than to start going through compressed tgz files, and applying multiple layers (changesets), etc.

What does this option do?

@rouming
Copy link
Author

rouming commented May 2, 2024

Does not copy files to the resulting tarball if they are not modified. The resulting tarball will be populated only with files, which we get through the regular OCI procedure. For example here the eve/kdump was modified:

Create OCI config for docker.io/lfedge/eve-kdump:3f3131a76693faef830263d635d2b3beef78787c-dirty-ad49020-amd64
Image docker.io/lfedge/eve-kdump:3f3131a76693faef830263d635d2b3beef78787c-dirty-ad49020-amd64 arch amd64 found in local cache, not pulling
Image docker.io/lfedge/eve-kdump:3f3131a76693faef830263d635d2b3beef78787c-dirty-ad49020-amd64 arch amd64 found in local cache, not pulling

So only kdump will be included. The rest is skipped.

@deitch
Copy link
Collaborator

deitch commented May 3, 2024

Let's run through a simple use case:

  1. I run lkt build -o foo.tar, it creates a tar file with the following 10 files: a,b,c,d,e,f,g,h,i,j.
  2. I modify my input so that only files g and h are changed, replaced with r and s.
  3. I rerun build with --input-tar foo.tar. It builds a new tar, call it new.tar. The contents are a,b,c,d,e,f,i,j,r,s sourced as follows:
    • a,b,c,d,e,f,i,j - copied from foo.tar
    • r, s - from whatever OCI image generated them

You want to create an option so that I have a tar file that contains only r and s?

@rouming
Copy link
Author

rouming commented May 6, 2024

You want to create an option so that I have a tar file that contains only r and s?

That's correct. Motivation is the same: creation of full copy ~900Mb tar and then copy it to the rootfs can take a while. Changed files are usually not greater than ~150Mb, which saves ~10-15 seconds from overall image build time.

But! The issue is (about which I completely forgot) that eventually we need to merge both tars, but that can be done later, let's say if sizes of a tar with modified files grows to the half of the original (--input-tar foo.tar) tar size. Need to do some experiments. Don't you mind if we keep this PR hanging here for a while?

@deitch
Copy link
Collaborator

deitch commented May 9, 2024

Don't you mind if we keep this PR hanging here for a while?

Sure, we can keep it open. I need time to think this one through.

@deitch
Copy link
Collaborator

deitch commented May 9, 2024

You want to create an option so that I have a tar file that contains only r and s?

That's correct. Motivation is the same: creation of full copy ~900Mb tar and then copy it to the rootfs can take a while. Changed files are usually not greater than ~150Mb, which saves ~10-15 seconds from overall image build time.

It took me some time to figure out what was bothering me about this. When you run linuxkit build, it always generates a viable, usable output. What format that is in depends on the --format option; where the data comes from might be OCI images or another tar file (--input-tar). But in all cases, linuxkit build = "build me a fully usable self-contained output."

This change makes it so that it is not exactly that, the output no longer is fully usable, but rather a diff. That sort of breaks the philosophy of linuxkit build (and can confuse people; it just did it to me).

I understand why you might want this. linuxkit itself has such intimate knowledge of how things are built, if you want a diff, it is the convenient place to get it.

It sounds like you aren't clear if this is useful to you in the first place. Let's resolve that. If it still is, then let's find a way to make it happen.

I am not sure it breaks it enough to reject doing it; you are adding an explicit flag that says "only give me the modified info". I wouldn't make it work only if --input-tar is provided. If this is a valid option, then it is reasonable for someone to build a.tar, modify the build.yml, then want to build b.tar where b.tar is just the diffs.

@rouming
Copy link
Author

rouming commented May 22, 2024

Will close this for now.

@rouming rouming closed this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants