Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental / cached build of definition files #2666

Open
Heng-Zhou opened this issue Feb 16, 2024 · 3 comments
Open

Incremental / cached build of definition files #2666

Heng-Zhou opened this issue Feb 16, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Heng-Zhou
Copy link

Heng-Zhou commented Feb 16, 2024

I would like to see incremental build feature implemented in Singularity.

Suppose we have a big definition file which takes quite a while to build, and we are going to add a bit to it, say, installing a new small package. Currently, we have to build the whole thing from scratch for a long time, just for a small change. So, I was wondering if Singularity can implement incremental build, just like the same concept used commonly in compilers of C++, etc., that would take only a small period of time for a small addition to the definition file, instead of wasting too much time on repeated installation and settings.

@Heng-Zhou Heng-Zhou added the enhancement New feature or request label Feb 16, 2024
@dtrudg
Copy link
Member

dtrudg commented Feb 16, 2024

TLDR - This is unlikely in definition file builds, but see the Dockerfile OCI-Mode builds in 4.1 (https://docs.sylabs.io/guides/4.1/user-guide/build_a_container.html#building-from-dockerfiles)

Hi @Heng-Zhou. This feature has been requested many times during the history of Singularity. I'll write a lengthy reply here so I can direct any future questions to it. I am going to frame the reply around differences vs Dockerfiles, as that's how many people are familiar with incremental/cached container builds. However, I've also addressed the difference to code compilation that you mention.

Incremental definition file builds are more complicated than it might seem because:

  • Singularity's native runtime has no concept of layers, unlike Docker / other OCI runtimes (however the new OCI-Mode can keep layers).
  • A singularity definition file is a collection of free-form shell scripts, not individual instructions like a Dockerfile.

The second point is really the blocker. In a Dockerfile every RUN line is executed separately. Consider a %post script, though... It's a single shell script, and you can do anything there you'd do in a shell on your host. Take this example:

%post
     MY_PROG=ripgrep
     echo "Installing $MY_PROG"

If I build this def file, it will just echo a message during the build. Now I add another line...

%post
     MY_PROG=ripgrep
     echo "Installing $MY_PROG"
     apt install $MY_PROG

If an incremental re-build only runs the new line then $MY_PROG will not be set, and we don't get the behaviour we expect. The MY_PROG env var is not exported, or in an %environment block... it's only temporary during the %post script's execution, so there is no information about it in the final container.

In essence, a definition file %post block is like a single RUN bash -c <content of %post> line would be in a Dockerfile. If you write a Dockerfile with everything in one RUN line then you don't benefit from any caching / incremental builds.

Because the %post script is a free-form shell script that could contain almost anything, it is essentially impossible to identify which pieces need be run again if it is edited. You mention compiling software - this is a much simpler problem as it is governed by a makefile where every make target has dependencies explicitly defined by the person or build system that wrote the makefile. Generally these dependencies are just the presence of a file - and it's then easy to look at file creation dates to see what needs to be rebuilt. Analysing an arbitrary shell script to run the minimal set of commands for an incremental build is a very different problem.

At this point, it's important to note that for SingularityCE 4.1 we added the ability to build Dockerfiles in our OCI-Mode. Dockerfile builds do cache intermediate steps. We recommend using OCI-Mode Dockerfile builds if this behaviour is critical to you (https://docs.sylabs.io/guides/4.1/user-guide/build_a_container.html#building-from-dockerfiles).

Introducing caching of instructions during a definition file build, at this point, would require a complete re-write of the build code to allow multiple %post blocks as a unit of caching. This is unlikely to be tackled by Sylabs unless:

  • At a major version (e.g. 5.0 or 6.0) there was a way to make the changes without adversely affecting users stuck on prior versions of Singularity.
  • It was a strong requirement for many customers of SingularityPRO, who's purchases ultimately fund much of the development work that is visible in SingularityCE.

If a member of the community wanted to contribute incremental builds to SingularityCE, we would certainly consider any proposal as long as:

  • At a major version (e.g. 5.0 or 6.0) there was a way to make the changes without adversely affecting users stuck on prior versions of Singularity.
  • The proposal was well discussed and defined before code was written.
  • We are condfident that the proposal could be acheived, with satisfactory testing, with our guidance - but without Sylabs having to commit large amounts of engineering effort.

Having said all of this - we do constantly review what features are priorities of users, and what we can achieve with our development resources. What I've written above may not always be the case in future.

@dtrudg dtrudg changed the title Incremental build Incremental / cached build of definition files Feb 16, 2024
@Heng-Zhou
Copy link
Author

I'm afraid you have some misunderstanding about incremental build in your example. When I say "take only a small period of time for a small addition" in my post, I meant building the new thing on top of the existing container that has been built and re-used; I did not mean building only the newly added thing, "apt install $MY_PROG" in your example. I assumed that is known implicitly by anyone who received any Computer Science education so I did not wrote.

For the remaining, I don't think that is an argument based on technology. That is only an argument based on money. You don't wanna do it just because no one pay you enough money to do it. That's it.

@dtrudg
Copy link
Member

dtrudg commented Feb 19, 2024

Before I reply to your technical point, I would like to ask you to reflect on the following if you are going to continue to interact with this project:

I assumed that is known implicitly by anyone who received any Computer Science education so I did not wrote.

  1. Many people who use Singularity, have contributed to Singularity, and participate on GitHub or elsewhere, did not have a computer science education. It is not helpful to assume people do, or that participation from those lacking a computer science education is less valuable.

  2. A computer science education is not the issue here. Regardless of your background it is easy to have misunderstandings, or assume something incorrectly, in an online thread of text. If somone does not appear to understand a message as you intended, try to clarify politely. I do have a PhD in Computer Science, and have spent 15+ years working in academia and software development. However, I am comfortable that there is still much that I do not know, that others explain differently than I would, and arriving at a shared understanding of a problem often takes time.

For the remaining, I don't think that is an argument based on technology. That is only an argument based on money. You don't wanna do it just because no one pay you enough money to do it. That's it.

The continued development and maintenance of SingularityCE, which you are able to obtain and use for free, is largely due to myself and others being paid to develop and maintain it.

To my knowledge, we haven't received any money from yourself or your employer, but we have taken the time to answer questions where possible and discuss things that you have asked.

Anyone in the open source community is very welcome to contribute to SingularityCE. We have fewer contributions compared to some other software because we are a systems-level software project, while our user base is primarily developing/using scientific software. Because of this, most new features are developed by Sylabs employees.

When features are suggested, we have to ask many questions:

  1. How important is the feature? Is there a workaround without it?
  2. How many people would use it?
  3. Can it be introduced in a compatible manner?
  4. Does it fit well with other planned features / the overall direction of the project?
  5. How technically complex is the feature? How long would it take to develop?

Once we know the answer to these questions then we (Sylabs) are able to prioritise. There will always be more features requested than we can take on.

We urge anyone who would like to have more impact on feature development in SingularityCE to:

  • Help us answer the questions above.
  • Take part in the monthly open community call, to discuss ideas.
  • Consider designing and/or implementing the feature and contributing it to the project.

We ask that people are mindful that:

  • SingularityCE is open source, available to you to obtain and use without cost.
  • There is a cost involved to develop and maintain software.
  • Projects are often balancing the conflicting needs and priorities of many different users. There is rarely a single reason why a feature is or is not implemented.

With regard to your comment about the builds:

I'm afraid you have some misunderstanding about incremental build in your example. When I say "take only a small period of time for a small addition" in my post, I meant building the new thing on top of the existing container that has been built and re-used; I did not mean building only the newly added thing, "apt install $MY_PROG" in your example.

This is implicit in the discussion above - I assume that you would want to perform the apt install $MY_PROG on top of the existing container, which was built before that command was added. The difficulty is that apt install $MY_PROG relies on things that happen earlier in the script (ie. the value of $MY_PROG)... it is impossible to just do apt install $MY_PROG on top of the existing container, because we don't know what the value of $MY_PROG is without running all of the %post block.

A practical workaround, for cases where you can define everything without reference to any state during the original %post script, is to manually perform incremental builds by adding things in a later definition file against the previous SIF image.

Bootstrap: localimage
From: my-previous-build.sif

%post
   # Add another tool
   apt install vim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants