Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wire through CRI checkpoint RPC #6965

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

adrianreber
Copy link
Contributor

@adrianreber adrianreber commented May 20, 2022

This connects the new CRI ContainerCheckpoint RPC to the existing internal checkpoint functions. With this commit it is possible
to checkpoint a container in Kubernetes using the Forensic Container Checkpointing KEP (kubernetes/enhancements#2008):

# curl -X POST "https://localhost:10250/checkpoint/namespace/podId/container"

Which will result in containerd creating a checkpoint in the location specified by Kubernetes (usually /var/lib/kubelet/checkpoints).

The latest version of crictl already supports creating checkpoints via crictl checkpoint.

@k8s-ci-robot
Copy link

Hi @adrianreber. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bobbypage
Copy link
Contributor

/cc

@k8s-ci-robot
Copy link

@bobbypage: GitHub didn't allow me to request PR reviews from the following users: bobbypage.

Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch 2 times, most recently from d0bc8d7 to 7b91762 Compare July 23, 2022 11:45
@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch from 7b91762 to aa373ec Compare July 23, 2022 12:20
@adrianreber adrianreber marked this pull request as ready for review July 23, 2022 12:51
@adrianreber
Copy link
Contributor Author

Switching to ready for review after the corresponding CRI API changes have been merged and crictl is also ready for the new CheckpointContainer() RPC.

@adrianreber
Copy link
Contributor Author

Not sure why the one CI run failed.

@cpuguy83
Copy link
Member

Can you change the reference in your commit to the kubernetes/enhancements#2008?
Right now it's referencing an unrelated containerd PR.

api/next.pb.txt Outdated Show resolved Hide resolved
@adrianreber
Copy link
Contributor Author

Can you change the reference in your commit to the kubernetes/enhancements#2008? Right now it's referencing an unrelated containerd PR.

Done.

@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch 3 times, most recently from 6338330 to f63f1cf Compare September 1, 2022 16:49
@adrianreber
Copy link
Contributor Author

The windows CI linter is timeouting and probably unrelated to my changes. Everything else seems happy.

@estesp
Copy link
Member

estesp commented Sep 1, 2022

The windows CI linter is timeouting and probably unrelated to my changes. Everything else seems happy.

I've noticed that on some recent PRs; just opened #7356 to hopefully solve that.

@estesp
Copy link
Member

estesp commented Sep 1, 2022

/ok-to-test

@estesp estesp added the area/cri Container Runtime Interface (CRI) label Sep 21, 2022
@mikebrow
Copy link
Member

@adrianreber needs rebase

@estesp
Copy link
Member

estesp commented Mar 5, 2024

@cpuguy83 I didn't want to dismiss your stale review without your input; that file is no longer part of the commit, so I assume your concerns are handled, but if you have any other concerns let me know

@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch from 57c62d5 to 80e6b92 Compare March 5, 2024 09:10
@adrianreber
Copy link
Contributor Author

Is the Debian package a good source or is there another way you would like to take the package from?

This is a GCP COS image, you can see the top of the kern.log to confirm.

Here's where we can make changes to add more things to the node: https://github.com/containerd/containerd/blob/main/contrib/gce/configure.sh

looks like criu.org / github repo both have sources only :( do ping me back if you are still stuck.

@dims I must say I have no idea how solve this. It seems really difficult to install anything on COS which is not statically linked. Which is easy for Go application, but CRIU is not easily statically linked. So I cannot really build CRIU on the host where containerd is built and then transfer it to GCE host, because all libraries would be missing. I suppose I cannot build it on COS because that seems not doable based on the existing setup which builds containerd on another host.

Not sure how to get a CRIU binary into a COS image.

@rst0git
Copy link
Contributor

rst0git commented Mar 5, 2024

Is the Debian package a good source or is there another way you would like to take the package from?

This is a GCP COS image, you can see the top of the kern.log to confirm.
Here's where we can make changes to add more things to the node: https://github.com/containerd/containerd/blob/main/contrib/gce/configure.sh
looks like criu.org / github repo both have sources only :( do ping me back if you are still stuck.

@dims I must say I have no idea how solve this. It seems really difficult to install anything on COS which is not statically linked. Which is easy for Go application, but CRIU is not easily statically linked. So I cannot really build CRIU on the host where containerd is built and then transfer it to GCE host, because all libraries would be missing. I suppose I cannot build it on COS because that seems not doable based on the existing setup which builds containerd on another host.

Not sure how to get a CRIU binary into a COS image.

@dims Would it be possible to install CRIU in Dockerfile.test?

@adrianreber
Copy link
Contributor Author

I added code to containerd to check if there is a version of CRIU installed at all and if it has the minimum version needed for Kubernetes (3.16). If no CRIU version is found I am returning the same error message to the kubelet as if containerd would not implement this interface. This is not perfect as the error message is wrong, but this will make the test work on the Kubernetes side and once that is done I can change Kubernetes to handle an error message like "CRIU is not installed or too old correctly".

I just saw I broke the windows build. Will fix it.

@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch 2 times, most recently from 27f1193 to 168bd16 Compare March 5, 2024 12:35
@adrianreber
Copy link
Contributor Author

All tests are happy now.

@estesp estesp added this to the 2.0 milestone Mar 5, 2024
@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch 2 times, most recently from b27107f to 60ed9af Compare March 5, 2024 21:33
@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch 2 times, most recently from 6a442b4 to f4435d8 Compare March 6, 2024 07:43
@adrianreber
Copy link
Contributor Author

The remaining CI errors seem unrelated as far as I can tell.

This connects the new CRI ContainerCheckpoint RPC to the existing
internal checkpoint functions. With this commit it is possible
to checkpoint a container in Kubernetes using the Forensic Container
Checkpointing KEP (containerd#2008):

 # curl X POST "https://localhost:10250/checkpoint/namespace/podId/container"

Which will result in containerd creating a checkpoint in the location
specified by Kubernetes (usually /var/lib/kubelet/checkpoints).

This is a Linux only feature because CRIU only exists on Linux.

Rewritten with the help of Phil Estes.

Signed-off-by: Phil Estes <estesp@gmail.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber adrianreber force-pushed the 2022-05-20-checkpoint-cri-rpc branch from f4435d8 to f25770e Compare March 7, 2024 17:35
@k8s-ci-robot
Copy link

@adrianreber: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-containerd-k8s-e2e-ec2 f25770e link false /test pull-containerd-k8s-e2e-ec2

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@estesp
Copy link
Member

estesp commented Mar 7, 2024

@dims This new pull-containerd-k8s-e2e-ec2 is an optional test, correct?

@estesp estesp added this pull request to the merge queue Mar 7, 2024
@dims
Copy link
Member

dims commented Mar 7, 2024

@dims This new pull-containerd-k8s-e2e-ec2 is an optional test, correct?

Yep!

Merged via the queue into containerd:main with commit 3c34e2c Mar 7, 2024
47 of 48 checks passed
Code Review automation moved this from Ready For Review to Done Mar 7, 2024
@adrianreber adrianreber deleted the 2022-05-20-checkpoint-cri-rpc branch March 8, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) ok-to-test size/XXL
Projects
Development

Successfully merging this pull request may close these issues.

None yet