Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use buildx and hpc image to build Antrea Windows images #6311

Closed
wenyingd opened this issue May 9, 2024 · 12 comments · Fixed by #6325
Closed

Use buildx and hpc image to build Antrea Windows images #6311

wenyingd opened this issue May 9, 2024 · 12 comments · Fixed by #6325
Assignees
Labels
area/build-release Issues or PRs related to building and releasing area/OS/windows Issues or PRs related to the Windows operating system. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@wenyingd
Copy link
Contributor

wenyingd commented May 9, 2024

Describe the problem/challenge you have

As we already remove the support on Docker runtime on Windows since Antrea v2.0, we could assume that antrea-agent is running as a host process container on Windows Node using Containerd runtime ( we don't discuss the process mode here). Microsoft has provided a minimal base image whose size is 7.45KB to help build the host process container, which makes the container can run on any Windows host without dependency on the exact Window OS version. So we shall think about migrating our Antrea Windows image to leverage this feature.

The advantages include,

  1. The Antrea Windows image size can be reduced explicitly. The image size is ~260MB using hpc base according to my test, the container files includes antrea-agent.exe, antrea-cni.exe, antctl.exe, host-local.exe, and openvswitch. while the number is about 560MB using powershell:nanoserver-1809 (Server 2019), and 770MB using powershell:nanoserver-ltsc2022 (Server 2022)
  2. We don't need to bump up Antrea Agent image base after upgrading Windows host OS version in the future, as the image is independent from the OS versions.
  3. We can use buildx to build Windows image using a Linux env. Then we don't need to prepare additional Windows image building machine in CI setup, and we can share the build cache. All these can help reduce the build time.

The candidate changes include,

  1. We may need to modify Windows OVS preparations. In the existing code, we tried to install OVS dependent files in the middle layer then copy the required dll files to the final OVS images. If using hpc base image and use buildx to build on Linux machine, we can't run the Windows binary inside the container middle layer, so we may copy the files to final image, and run them on the Windows Node if necessary (e.g., vcredist, openssl)
  2. The delivered antrea-windows image can only be run as host-process container, which uses the Windows host file system and network to provide the capability. If we want to run the container in other ways, e.g., without dependency on the Windows host, we may need another docker file to copy the files from antrea-windows image to the corresponding image base according to the target Windows OS version, e.g., powershell-nanoserver or servercore.

Describe the solution you'd like

Using buildx and hpc base image to build Antrea Windows image.

Anything else you would like to add?

@wenyingd wenyingd added the kind/feature Categorizes issue or PR as related to a new feature. label May 9, 2024
@wenyingd
Copy link
Contributor Author

wenyingd commented May 9, 2024

@wenyingd wenyingd self-assigned this May 9, 2024
@antoninbas
Copy link
Contributor

We don't need to bump up Antrea Agent image base after upgrading Windows host OS version in the future, as the image is independent from the OS versions.

But if there is a new version of mcr.microsoft.com/oss/kubernetes/windows-host-process-containers-base-image published, we would still need to go through that process? That being said, it seems unlikely as the current version (v1.0.0) hasn't changed in 2+ years.

We may need to modify Windows OVS preparations. In the existing code, we tried to install OVS dependent files in the middle layer then copy the required dll files to the final OVS images. If using hpc base image and use buildx to build on Linux machine, we can't run the Windows binary inside the container middle layer, so we may copy the files to final image, and run them on the Windows Node if necessary (e.g., vcredist, openssl)

Are you referring to this: microsoft/Windows-Containers#34?
In other words, we can only build Windows containers on Linux using BuildKit if there is no RUN command in the Dockerfile.
Given this limitation, I am not sure how it will ever be possible to build the Windows image fully on Linux?
Or maybe we can move all RUN steps in a Linux image, then copy the files to the Windows image? That would work as long as there is no build step that absolutely has to run on Windows.
Maybe you could clarify what you had in mind for this.

The delivered antrea-windows image can only be run as host-process container, which uses the Windows host file system and network to provide the capability. If we want to run the container in other ways, e.g., without dependency on the Windows host, we may need another docker file to copy the files from antrea-windows image to the corresponding image base according to the target Windows OS version, e.g., powershell-nanoserver or servercore.

How is this a limitation, given that containerd (with host-process container) is the only supported deployment option on Windows for Antrea (besides Windows native services of course)?


By the way, it feels like the 2 things described in this issue are a bit orthogonal: the transition from the current nanoserver base image to the hpc image could happen independently from the transition to BuildKit (i.e., building the Windows image entirely on a Linux machine, using BuildKit)?
One thing that is not mentioned in this issue is that if we start using BuildKit, I hope we can use the registry cache backend (https://docs.docker.com/build/cache/backends/), like we do for the Linux image.

Another thing is that experimental support for building Windows containers (on Windows) with BuildKit was recently added: https://github.com/moby/buildkit/blob/master/docs/windows.md. It's probably too early for us to use it, and I am not sure it is relevant to us, if we are considering fully building the Windows image on Linux.

@antoninbas antoninbas added area/build-release Issues or PRs related to building and releasing area/OS/windows Issues or PRs related to the Windows operating system. labels May 9, 2024
@wenyingd
Copy link
Contributor Author

wenyingd commented May 10, 2024

But if there is a new version of mcr.microsoft.com/oss/kubernetes/windows-host-process-containers-base-image published, we would still need to go through that process?

In theory, yes, we may need to bump up this base image version. Besides, the base image just provides an isolation to run the expected processes, and it provides a storage/layer to help store or run the binary or files when building images. We don't use this layer to build/compile the bits. I don't think we always have the requirement to change it.

Or maybe we can move all RUN steps in a Linux image, then copy the files to the Windows image?

Yes, this is what I thought. For example, we can build antrea-agent.exe using golang based on Linux layer by setting GOOS=windows, then copy it to the final windows hpc image in the last step; we can use linux airpine image to download zip/tar files, and extract the files, and use it to organize the directory structures.
In general, we use Linux layer to build or compile Windows binaries leveraging the multi-platform build capability, and use Windows hpc image to store and run the Windows binaries.

How is this a limitation, given that containerd (with host-process container) is the only supported deployment option on Windows for Antrea (besides Windows native services of course)?

This is for the future extension, not the existing requirement. For example, if we plan to provide other Windows container images, e.g., a separate container to run antctl on Windows, which not require host-process, then we need additional step to copy the binaries from hpc to nanoserver without a new build.

I hope we can use the registry cache backend (https://docs.docker.com/build/cache/backends/), like we do for the Linux image.

I didn't verified if the registry cache can work on Windows image. From my experiment, I suspect the answer is no. As we need to new buildx instance dedicated for Windows if we expect to build Windows container on Linux, I am not sure if the cache can share in multiple buildx instances. Maybe we can add support on it in the next step?

experimental support for building Windows containers (on Windows) with BuildKit was recently added: https://github.com/moby/buildkit/blob/master/docs/windows.md. It's probably too early for us to use it, and I am not sure it is relevant to us, if we are considering fully building the Windows image on Linux.

If our plan is to build windows container on Linux, then the buildx support on Windows is not a blocker for us. As for we can fully build Windows image on Linux or not, we only build antrea-agent binary in the current Windows Agent image, and other dependencies like openvswitch and host-local.exe are got as binaries by downloading from given URLs. So I think building on Linux can cover our requirement by now.

Besides, a major motivation to move to hpc is it may introduce much work if we continuing building Windows images per OS version, e.g., we may need to think about how to support the previous Windows versions. This is because a container image built on Server 2022 can't run on Server 2019 on Windows. So we shall either maintain antrea-agent-windows for both Server 2019 and Server 2022, or we drop the support for Server 2019 (but we may have users who still use 2019).

As for using buildx to build image based on Linux, it is because Windows container image are always larger, and we need many additional steps in the base image to prepare the build utilities in the base images like git and mgwin as Windows image is not so friendly for dev, which makes the build time longer.

@antoninbas
Copy link
Contributor

I didn't verified if the registry cache can work on Windows image. From my experiment, I suspect the answer is no. As we need to new buildx instance dedicated for Windows if we expect to build Windows container on Linux, I am not sure if the cache can share in multiple buildx instances. Maybe we can add support on it in the next step?

It would be a different image. For example, for Linux, we use antrea/base-ubuntu-cache as the build cache image. For Windows, it would be antrea/base-windows-cache.
The important thing is that these flags should work when we build Windows containers on Linux with BuildKit:

--cache-to type=registry,ref=$image-cache:$BUILD_TAG,mode=max
--cache-from type=registry,ref=$image-cache:$BUILD_TAG,mode=max

@wenyingd
Copy link
Contributor Author

wenyingd commented May 14, 2024

@antoninbas I got a strange observation with cache when building Windows image by multi-platform. It may take about 20s to push antrea/base-windows to registry, but it takes hundreds of seconds to push the cache to registry. Since it is simple when building Windows images, only downloads and extracts the cni files to base-windows, and downloads and extracts ovs zip to ovs-windows, and no complicated compilations in them, do we really need cache for Windows?

@antoninbas
Copy link
Contributor

@antoninbas I got a strange observation with cache when building Windows image by multi-platform. It may take about 20s to push antrea/base-windows to registry, but it takes hundreds of seconds to push the cache to registry. Since it is simple when building Windows images, only downloads and extracts the cni files to base-windows, and downloads and extracts ovs zip to ovs-windows, and no complicated compilations in them, do we really need cache for Windows?

There are 2 reasons why we use caching for the Linux image:

  1. We want to make sure that we always have up-to-date system dependencies (ubuntu:22.04 image and apt packages). This means that we always want to build everything from scratch in case there was any update. To avoid doing unnecessary work if nothing has changed, we leverage registry-based caching.
  2. This only matters if the work we do to build the base images is expensive. In the case of Linux, we build OVS from scratch.

It seems that for Windows, these reasons may not apply. The hpc image is not mutable and is essentially a scratch image. All the dependencies we install are basically versioned (?). Building the base image is cheap. So there would be no need to use registry-based caching. However, if this is the case, do we even need to push the base image to the registry? And if we need to push it for some reason, how do we automate the update process (today we have to trigger new builds for the base image manually, which is inconvenient).

@antoninbas
Copy link
Contributor

@wenyingd From your PR:

windows-ovs: 22.4s
windows-base: 15s
antrea-windows: 190s

If this is correct, then it looks like pushing base images to the registry is not very useful. It would add complexity for very limited benefits, and pushing / pulling base images will take some time too.

@wenyingd
Copy link
Contributor Author

If this is correct, then it looks like pushing base images to the registry is not very useful. It would add complexity for very limited benefits, and pushing / pulling base images will take some time too.

Yes, this is also an ask in my mind. Originally, we will use the windows-base to save CNI (host-local) files, and install the build utilities like 7zip, git and mgwin (for make and gcc) which is required to build the Windows image on a Windows host base. But now we didn't require these utilities as we will use buildx and Linux base (ubuntu22.04) to build everything. It seems the windows-base is useless. So maybe I can remove it in the change?

@wenyingd
Copy link
Contributor Author

windows-ovs: 22.4s
windows-base: 15s
antrea-windows: 190s

These statistics are got without configuring cache options.

@antoninbas
Copy link
Contributor

It seems the windows-base is useless. So maybe I can remove it in the change?

Yes, I would suggest trying this. It will simplify maintenance IMO, and reduce complexity of the build process.

@antoninbas
Copy link
Contributor

windows-ovs: 22.4s
windows-base: 15s
antrea-windows: 190s

These statistics are got without configuring cache options.

Yes, that's how I understood it. I was saying that the numbers look good, and that there seems to be no need for caching or even for a base image that we push to the registry.

@wenyingd
Copy link
Contributor Author

wenyingd commented May 15, 2024

I tried to merge the logic of building base and agent into a single Dockerfile (see my latest update), the total build time in one round on a fresh setup is about 241s (another try is 218s), including,

  • make windows-bin, 161s
  • go mod download, 32s
  • export to image, 16s (including push to remote registry, 6.5s)
  • import golang, 20s
  • others, ~10s

Do you think it is acceptable? @antoninbas

Below is a sample of build details,

$ WINDOWS_PUSH=1 make build-windows
===> Building Antrea bins and antrea/antrea-windows Docker image <===
/home/ubuntu/antrea/build/images/build-windows.sh --agent-tag v2.1.0-dev-7f8e0784.dirty --push
windows-img-builder
+ docker buildx build --platform windows/amd64 -o type=registry -t cattydong/antrea-windows:v2.1.0-dev-7f8e0784.dirty --build-arg GO_VERSION=1.21 --build-arg BUILD_TAG=antrea-v2.1 --build-arg CNI_BINARIES_VERSION=v1.4.0 -f build/images/Dockerfile.build.windows .
[+] Building 241.3s (22/22) FINISHED                                                                                                                   
 => [internal] booting buildkit                                                                                                                   1.9s
 => => pulling image moby/buildkit:buildx-stable-1                                                                                                1.2s
 => => creating container buildx_buildkit_windows-img-builder0                                                                                    0.7s
 => [internal] load build definition from Dockerfile.build.windows                                                                                0.2s
 => => transferring dockerfile: 1.99kB                                                                                                            0.0s
 => [internal] load metadata for docker.io/cattydong/windows-ovs:antrea-v2.1                                                                      2.0s
 => [internal] load metadata for mcr.microsoft.com/oss/kubernetes/windows-host-process-containers-base-image:v1.0.0                               1.1s
 => [internal] load metadata for docker.io/library/golang:1.21                                                                                    2.0s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                     0.0s
 => [auth] cattydong/windows-ovs:pull token for registry-1.docker.io                                                                              0.0s
 => [internal] load .dockerignore                                                                                                                 0.0s
 => => transferring context: 545B                                                                                                                 0.0s
 => [antrea-ovs 1/1] FROM docker.io/cattydong/windows-ovs:antrea-v2.1@sha256:a9bdd2d85609b3e4ac97671aac1a60d4b9e46a4df911410360bbe9f4d3303e70     5.8s
 => => resolve docker.io/cattydong/windows-ovs:antrea-v2.1@sha256:a9bdd2d85609b3e4ac97671aac1a60d4b9e46a4df911410360bbe9f4d3303e70                0.1s
 => => sha256:d0a8bbc89acb178a25231dff13213cc038d6748997bafe1f4d7acd54c78df431 50.40MB / 50.40MB                                                  2.3s
 => => extracting sha256:d0a8bbc89acb178a25231dff13213cc038d6748997bafe1f4d7acd54c78df431                                                         3.0s
 => [antrea-build-windows 1/7] FROM docker.io/library/golang:1.21@sha256:be8e71d3072cf38bc3430cae802bd2775fd2a09f886a4b54d13322c38ea4b1d3        20.0s
 => => resolve docker.io/library/golang:1.21@sha256:be8e71d3072cf38bc3430cae802bd2775fd2a09f886a4b54d13322c38ea4b1d3                              0.0s
 => => sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 32B / 32B                                                          0.2s
 => => sha256:3b4c02eb9df2b69e8d651f9fc885dfdc9075fd282066a484d403a84088d979a7 125B / 125B                                                        0.2s
 => => sha256:a4de642a3616b804e505371c0ceff7304949ee0f3979e0cd6141752c897593a1 67.01MB / 67.01MB                                                  1.4s
 => => sha256:8e85b9cced63d432af9e108bc7a4bc91a79ee170d6c6bfa49f3eb35b371846d2 92.20MB / 92.20MB                                                  1.9s
 => => sha256:6582c62583ef22717db8d306b1d6a0c288089ff607d9c0d2d81c4f8973cbfee3 64.14MB / 64.14MB                                                  3.0s
 => => sha256:891494355808bdd3db5552f0d3723fd0fa826675f774853796fafa221d850d42 24.05MB / 24.05MB                                                  0.6s
 => => sha256:c6cf28de8a067787ee0d08f8b01d7f1566a508b56f6e549687b41dfd375f12c7 49.58MB / 49.58MB                                                  1.1s
 => => extracting sha256:c6cf28de8a067787ee0d08f8b01d7f1566a508b56f6e549687b41dfd375f12c7                                                         3.5s
 => => extracting sha256:891494355808bdd3db5552f0d3723fd0fa826675f774853796fafa221d850d42                                                         1.2s
 => => extracting sha256:6582c62583ef22717db8d306b1d6a0c288089ff607d9c0d2d81c4f8973cbfee3                                                         3.3s
 => => extracting sha256:8e85b9cced63d432af9e108bc7a4bc91a79ee170d6c6bfa49f3eb35b371846d2                                                         3.5s
 => => extracting sha256:a4de642a3616b804e505371c0ceff7304949ee0f3979e0cd6141752c897593a1                                                         4.5s
 => => extracting sha256:3b4c02eb9df2b69e8d651f9fc885dfdc9075fd282066a484d403a84088d979a7                                                         0.0s
 => => extracting sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1                                                         0.0s
 => [internal] load build context                                                                                                                 9.3s
 => => transferring context: 260.65MB                                                                                                             9.2s
 => [stage-2 1/3] FROM mcr.microsoft.com/oss/kubernetes/windows-host-process-containers-base-image:v1.0.0@sha256:b4c9637e032f667c52d1eccfa31ad8c  0.6s
 => => resolve mcr.microsoft.com/oss/kubernetes/windows-host-process-containers-base-image:v1.0.0@sha256:b4c9637e032f667c52d1eccfa31ad8c63f1b035  0.0s
 => => sha256:f2784c8bc8525977aef9b6cd74e9a899274db7d82cd4b028e0ae049270927585 20.48kB / 20.48kB                                                  0.2s
 => => extracting sha256:f2784c8bc8525977aef9b6cd74e9a899274db7d82cd4b028e0ae049270927585                                                         0.2s
 => [antrea-build-windows 2/7] WORKDIR /antrea                                                                                                    0.4s
 => [antrea-build-windows 3/7] RUN --mount=type=cache,target=/go/pkg/mod/     --mount=type=bind,source=go.sum,target=go.sum     --mount=type=bi  32.3s
 => [antrea-build-windows 4/7] COPY . /antrea                                                                                                     2.1s
 => [antrea-build-windows 5/7] RUN --mount=type=cache,target=/go/pkg/mod/     --mount=type=cache,target=/root/.cache/go-build/     make window  161.2s
 => [antrea-build-windows 6/7] RUN mkdir -p /go/k/antrea/bin && mkdir -p /go/k/antrea/cni &&     cp /antrea/bin/antrea-agent.exe /go/k/antrea/bi  0.6s
 => [antrea-build-windows 7/7] RUN wget -q -O - https://github.com/containernetworking/plugins/releases/download/v1.4.0/cni-plugins-windows-amd6  1.8s 
 => [stage-2 2/3] COPY --from=antrea-build-windows /go/k /k                                                                                       0.3s 
 => [stage-2 3/3] COPY --from=antrea-ovs /openvswitch /openvswitch                                                                                1.0s
 => exporting to image                                                                                                                           16.1s
 => => exporting layers                                                                                                                           9.3s
 => => exporting manifest sha256:d1215841f11fbfebd424ce304e929b2dd50f00b2714302dec5f3cfd1e7c3cc63                                                 0.0s
 => => exporting config sha256:b78ed6493896f3d2c9cca1b9d0e9c9e4f8c3a3bf48ce09cbe5eaaf0eceeae4f9                                                   0.0s
 => => exporting attestation manifest sha256:34ccf3dfbf7993a275baed2d5104dfb38d1b82109dccc532e7c32bb34ffd976a                                     0.1s
 => => exporting manifest list sha256:a365fad08733d2e286eb59b24a9add57327bd4971ff8b52216b18b1312b135a4                                            0.0s
 => => pushing layers                                                                                                                             4.8s
 => => pushing manifest for docker.io/cattydong/antrea-windows:v2.1.0-dev-7f8e0784.dirty@sha256:a365fad08733d2e286eb59b24a9add57327bd4971ff8b522  1.7s
 => [auth] cattydong/antrea-windows:pull,push token for registry-1.docker.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-release Issues or PRs related to building and releasing area/OS/windows Issues or PRs related to the Windows operating system. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants