New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce GH cache usage during build workflow #2891
Comments
I am not greatly enamoured of the idea of adding too much logic to the I could be convinced otherwise, however. But our issue here really isn't clearing stuff out of the cache as opposed to not getting it in in the first place. Our specific problem is that we build or pull all packages in the That brings us to the second option you raised:
This is the most efficient solution you propose. And it is fairly easy to implement. The question is, does it make sense in the context of lkt?
One side issue you raised is:
No, it isn't an issue. When we use buildkit to build, and it sees that B depends on A, it looks for A in the registry, finds it and uses it inside the buildkit container. Only if it cannot find it in the registry does it fail. linuxkit itself has wired into buildkit to override that by finding it in its local cache, but if it doesn't find it in local cache, buildkit does its usual thing and pulls from registry. You can test it easily. Create a pkg, have it depends on something public, and do Which brings us to your third option:
This is the equivalent of your second option. If your second option is: "build only if it is not already published via: TAG=$(lkt pkg show-tag pkg/foo)
[ exists-in-registry ] || lkt pkg build pkg/foo I think I prefer this last option over any other. Only call
or even
but I am not convinced that buys us all that much more than just going all the way to: Maybe we should do both? |
Reading this again, I really am having a hard time reconciling But "build it unless it is in some registry, in which case leave it alone" is just strange for a command called |
Seems we exceeded the cache limit in our runs: https://github.com/lf-edge/eve/actions/runs/3387572937/jobs/5628487487#step:17:3 |
I'm working on those fixes. I will prioritize over the next 24 hours. |
Ah, sorry, I did not try to push you. Actuality it is not great that we exceed the limit of this kind, because we will not be able to rebuild the whole EVE-OS for example while modifying of some base package (eve-alpine) and all dependencies. Probably we can avoid such PRs and split them. |
We still need to fix it. But that will become a big issue. |
Well, It was another issue: #2904 |
@giggsoff several linuxkit updates have been merged in.
Do you want to update |
Thank you! |
Use case
We split build workflow into two jobs:
packages
(to build packages required to build EVE-OS image) andeve
(to compose final EVE-OS image). We share the data between jobs using GitHub cache. The problem is in limited size of the cache. It uses some kind of FIFO and removes old stored caches. In case of several PRs comes to build concurrently we loose the cache between jobs.I observed that during
packages
job we fill the cache with images that are already published, which is sub-optimal as we can pull them directly ineve
job. We try to split our changes and modify only packages should be modified in one PR. We should not fill the cache with unneeded data.Describe the solution you'd like
To reduce cache usage I see three possible solutions:
cache clean
command in linuxkit to remove entire cache, possibly we can addkeep-unpublished
flag to clean only published images.pkg build
. In that case we still need to pull the images we depends on (if image B depends on image A, A was published and we want to build B, we still need A to be pulled into linukit cache). But we do not want to have published images in cache regardless of dependencies.cc @deitch
The text was updated successfully, but these errors were encountered: