-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Add conditional CI for the containerd backed image store #46402
Conversation
8841e54
to
9e9af81
Compare
Note: we can't do the same in jenkins (filter on the label), we don't have the required plugins that would let us do that. We could have a check on the branch name perhaps? Check that the branch contains Another possibility is to run the unit/integration tests (~15 minutes run) for all pull requests, but make it pass no matter what. Any preferences? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we have currently 98 jobs. With this change we would have 98 + 45 (.test) + 21 (.windows) = 164 which would increase queue drastically for any repos on moby org (such as buildkit).
9e9af81
to
423b4ee
Compare
True, but these new jobs don't run for all PRs, only for those that they are needed |
This is true... for now. At one point it would be great to test both graph drivers and containerd snapshotters at the same time (once the CI is green). |
.github/workflows/.test.yml
Outdated
use_snapshotter: | ||
required: true | ||
type: boolean | ||
graph_driver: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
graph_driver: | |
storage_driver: |
Since the flag is actually --storage-driver
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 changed it!
423b4ee
to
4883e9d
Compare
955bd80
to
c194553
Compare
I don't think it's actually testing the containerd image store. (Very late edit: I meant on Windows here, but didn't write it....) When I tried something similar (see master...TBBle:moby:hack/containerd_image_store^) I found that the env-var Anyway, you should see Another sign it's not using the snapshotter is that it should have failed all of the tests during setup due to (I think) the wildly incorrect WCOW mount code in containerd 1.6 (which I contributed, AFAIR) vendored into Docker. That was replaced with good code in containerd 1.7.2, so I think getting anything meaningful out of the testsuite will depend on #45966. You can also see a couple of quick-hack code changes I had to get the daemon even start, to deal with two issues: One was that when the containerd storage backend is enabled, the network setup code for builder-next assumes we're on Linux and hence uses an explicit network mode instead of "auto". The other is that the graph driver name (also used to select snapshotter) is different between Docker ( I also added the matrix in a different spot, but that should not affect any of the above. Side thought: Are we actually running the Linux build through CI with |
I did at one point manage to make it build/run and run a container: But I never pushed the code because it was a quick hack to see what's needed
For Linux we have a draft PR that @vvoland opened and rebases from time to time, which is not ideal. That's why I opened this PR. I didn't update Jenkins because we don't have the plugins needed to filter with PR labels as I do in this PR. So basically, we don't yet test on linux but this PR changes that.
Thanks, let me double check that! |
The logs from the relevant run have expired, but the problem I was alluding to is triggered by the Docker integration test suite setup code, not usual container-running operations. I assume |
4767350
to
85b8def
Compare
I have rebased this PR and removed the windows part, at least with this we will have CI run on linux/amd64 |
85b8def
to
c852725
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments, but nothing majorly blocking as far as I can see, just some possible opportunities to reduce future work a little bit (and a typo).
My only real concern here is the artifact/code-cov question. The other questions/ideas are Windows-specific and since Windows isn't being enabled here, they can be punted to whoever does that, if you prefer.
If ("${{ inputs.use_snapshotter }}" -eq "true") { | ||
echo "TEST_INTEGRATION_USE_SNAPSHOTTER=1" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append | ||
echo "DOCKER_GRAPHDRIVER=windows" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless things have changed since last time I looked, this change won't have any effect:
- The daemon gets its env-vars from the service setup, created during the
register-service
call, and Go'sCreateService
API doesn't appear to support populating env-vars. If there was a command-line option for this, that would work, as the command-line passed to the daemon during--register-service
is bundled into the service configuration. The best option appears to be editing the service definition between registration and starting it. (Edit: The FIXME comment was added after I wrote this, so at least it's explicit this doesn't currently work now) - The daemon doesn't honour
DOCKER_GRAPHDRIVER
orDOCKER_DRIVER
on Windows. I'm not sure if there's any intention to do so, either, but it might be valuable once containerd image store works, as there's possibly going to be multiple Windows snapshotters available.
So I'd suggest moving this block down between the Start-Process ... --register-service
and the Start-Service
calls now, and dropping DOCKER_GRAPHDRIVER
, as that'll have to be done later anyway. I'm not sure if the test suite actually cares about TEST_INTEGRATION_USE_SNAPSHOTTER
, but no harm in setting it when also setting the registry key for the service, I guess? (I did so in my version, but don't recall if I checked for it having any effect or not)
use_snapshotter: # always false for now until we update containerd | ||
required: false | ||
type: boolean | ||
default: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should skip the runtime==builtin
axis of the two matrix jobs in this file, when this is true. use_snapshotter
makes no sense when not using containerd, and since it's started from a separate optional job, simply ignoring use_snapshotter
in that case will needlessly duplicate the integration test run (and fail).
I'm not sure how to do that in GitHub Workflows though. Maybe you can exclude
from the matrix using the job inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buildkit has something like this https://github.com/moby/buildkit/blob/master/.github/workflows/validate.yml#L22-L60
I'm sure I can find something with a little help from @crazy-max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we can do this in a followup since we are not calling this with containerd snapshotters enabled any more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked more-closely at your link to BuildKit, and wow, today I learned that GitHub Actions lets you do crimes.
But yeah, should be feasible to mess with the matrix to insert an appropriate exclude
clause based on the workflow inputs.
For the record, I was curious how the test suite would go on Windows now that containerd 1.7.6 has been vendored, so I rebased my version of this PR onto the mainline branch. Sadly, it still fails at the first action during busybox build ( Merged logs of failed step
But at least it's pretty easy to get it into a "testable" state, as the only CI things in my branch that this PR doesn't have is the TODO-marked Windows enablement. I might rebase my branch onto this PR later to verify that, and provide a base for further exploration, if I find time to play with it before this PR is merged. Edit: I did the rebase, it was relatively simple. Manually triggered a Windows 2022 run and it produces the same result as above. @rumpl if you wish to steal TBBle@7d4ffa1 or part thereof for this PR, you're welcome to. If not, that's fine, it can sit in my branch, and once this PR lands, I'll probably put it up in a draft PR for visibility along with whatever hacks I have to get the Windows-side (closer) to parity. Final edit: With enough hacks, I got that branch to actually run the testsuite on Windows. Still waiting for the last few to complete, but it looks like it'll be pretty close to the pass-rate of Linux (from this PR), which is nice. So there'll be at least a draft PR (due to all the hacks) for the Windows side of this coming in the future, once this one lands. Final final edit: #46539 created, since it's now in a good-enough state for initial review, and is mostly pending on this PR to merge. Post-edit: That 60% guess was wrong, it was calling a nil function-type value and panicking. |
48dad8c
to
711982e
Compare
For future reference / future self; we discussed this PR in Yesterday's maintainers call;
I should also add that this is a large diff, and "lotsa YAML", so I'll give it another look, but in doing so will somewhat depend on prior reviews of those that already happened, and I'll ask @crazy-max (as our in-house GHA guru) to give it a final "once-over". |
711982e
to
d639855
Compare
@@ -0,0 +1,32 @@ | |||
name: snapshotter-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's an analogue with test
but using the snapshotter, should it be test-snapshotter
? Then it should sort adjacent-to but below the test
workflow, I hope.
This occurred to me when I was thinking about whether we'd want to do split out the snapshotter test job for windows-2022
the same way, and if so, what to call it.
Also, thinking about future naming if and when the snapshotter becomes the default, i.e. in test
, what will the non-snapshotter test be named? non-snapshotter-test
is very English-like, but sorts weirdly, and the most important aspect (test
) is way over at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could call them test-graphdriver
and test-snapshotter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could call them
test-graphdriver
andtest-snapshotter
Works for me 👍
(GitHub grouping / listing checks remains horrible in either case 😂)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting! Gonna have a look; one quick thing if you're updating/pushing; there's a typo in the first commit (
|
20126b0
to
24cf1a6
Compare
Moving these jobs to reusable workflows to declutter the test.yaml workflow, this will also help reuse these workflows to run tests with the container-snapshotter feature enabled Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
This workflow will run all the tests (unit, integration and integration-cli) with the containerd-snapshotter feature enalbed. We only run these tests for pull requests that have the "containerd-integration" label and also on any other branch/tag. Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
24cf1a6
to
1952344
Compare
Fixed that typo and another one in the second commit that you didn't see :) |
Passing thought: Are there unit tests affected by the Prompted because unit tests are about 25% of the GHA runtime, and serialise ahead of the integration tests, so avoiding needless duplication seems valuable, although this would lead to the fascinating result of the snapshotter integration tests starting (and completing?) before the graphdriver integration tests. |
I don't think the unit-tests should be affected by this, but (honestly) can't say with 100% certainty (I know that some unit-tests are not-as-unit-as-they-should be (looking at you: "networking unit-tests that depend on iptables"). I definitely think we need to look at the matrices as a whole, as I think there's some unneeded duplication (as well as looking if we can make them more efficient; some checks depend "too much" on others, which means having to wait for others to complete before being able to restart them, whereas others are fast to run (but relatively high overhead for building the initial parts). All that said; I think it's ok to look at that in follow-ups (but we could consider creating a tracking ticket for this). (Also trying to give @rumpl some sleep after waking up in the middle of night, dreaming of YAML and more YAML) 😂 |
Sooo much yaml! 😭😂 But you are right @TBBle, we might be able to run unit tests only once. I’m off for Dockercon next week so won’t have time to take a closer look, so maybe we should review this as is and we can optimize later? The double unit tests only hit those that open a PR in the snapshotter case; ok on master too but no one really waits for that I guess. Hold off merging this though, now I kinda want to take a look at this this weekend while on the plane. On top of this there are other optimisations we can do to our CI but I would say they are out of scope of what I want to accomplish here but will definitely get back to it while the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could merge test-graphdriver
and test-snapshotter
together and create a dynamic matrix if snapshotter is enabled. That would reduce changes footprint significantly but needs to make sure it's possible. I'm looking at it on another branch and keep you posted.
# reusable workflow | ||
name: .smoke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a reusable workflow for smoke tests? It seems only called from test-graphdriver
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead we could just move this to a regular workflow named test-smoke.yml
as it's not tied to graphdriver or snapshotter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the idea is that if smoke
fails, we can abort the run early. That's what the name suggests to me, anyway. So perhaps it should be called from both test-runs?
- | ||
name: Set up tracing | ||
uses: ./.github/actions/setup-tracing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we need to set up tracing for unit tests? cc @cpuguy83
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct we aren't doing any tracing in unit tests.
# reusable workflow | ||
name: .windows-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just keep this in .windows.yml
? We already have a bunch of workflows, would be nice to avoid unnecessary split.
@@ -249,7 +174,6 @@ jobs: | |||
runs-on: ${{ inputs.os }} | |||
timeout-minutes: 120 | |||
needs: | |||
- build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to previous comment, this is racy as we could call this reusable workflow without having a dependency on build
so I think it's best to keep it build
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it should be a rename of test.yml
> test-graphdriver.yml
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK in Git, renames are detected automatically at diff
time, and target minimising diff size. In this case, the bulk of what was in test.yml ended up in .test.yml so it picked that as a rename.
build-dev: | ||
needs: | ||
- validate-dco | ||
uses: ./.github/workflows/.build-dev.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build-dev
is called from test-graphdriver
and test-snapshotter
which would make the cache racy.
Closing in favor of #46572 |
- What I did
Added CI for the containerd integration
Any PR that has the label
containerd-integration
will also run tests with the snapshottersExample PR without the needed label, note the skipped jobs rumpl#138
- How I did it
Moved everything from
test.yml
into a reusable workflow in.test.yml
and used that to run tests with graph drivers and containerd snapshotters. Also added a workflow to test snapshotters in windows.- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)