Add blog post about forensic container checkpointing #37412

adrianreber · 2022-10-20T13:53:40Z

This adds a blog article how to use forensic container checkpointing which was merged in Kubernetes v1.25.

kubernetes/enhancements#2008

netlify · 2022-10-20T14:08:38Z

✅ Pull request preview available for checking

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`0abb7e7`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/6387a57d644f1c000867cfd8
😎 Deploy Preview	https://deploy-preview-37412--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

sftim

Thank you for the pull request.
I'm afraid the post-release blog articles for Kubernetes v1.25 are done and dusted.

You're welcome to write about this @adrianreber, but please adjust this to make it clear that the Kubernetes v1.25 released has already happened and is a past thing. I would focus on the feature and de-emphasize when the feature went alpha (you should mention that still, just not in the article title).

/hold

We shouldn't merge this as it stands.

If this feature has changed for Kubernetes v1.26, we could do this a post-release article for v.126, and also mention the latest changes.

I've added a bunch of inline feedback about wording.

content/en/blog/_posts/2022-10-30-forensic-container-checkpointing/index.md

adrianreber · 2022-10-24T13:24:31Z

@sftim Thanks for the detailed feedback. I tried to change everything as you suggested.

sftim

A few more suggestions.

content/en/blog/_posts/2022-10-30-forensic-container-checkpointing/index.md

sftim · 2022-10-24T15:52:58Z

content/en/blog/_posts/2022-10-30-forensic-container-checkpointing/index.md

+Kubernetes schedules the new Pod onto a node. The kubelet on that node
+instructs the container runtime (CRI-O in this example) to create and start a
+container based on an image specified as `registry/user/checkpoint-image:latest`.
+CRI-O detects that `registry/user/checkpoint-image:latest`
+is a reference to checkpoint data rather than a container image. Then,
+instead of the usual steps to create and start a container,
+CRI-O fetches the checkpoint data and restores the container from that
+specified checkpoint.


Does this need any special support from the registry? If not, is there any risk that a malicious user creates an image that appears to be a normal app but is in fact a checkpointed container that is then able to bypass some security controls?

If there is a way to tell the two kinds apart, I assume that some registries might need to code in special support.

Does this need any special support from the registry?

No.

If not, is there any risk that a malicious user creates an image that appears to be a normal app but is in fact a checkpointed container that is then able to bypass some security controls?

Hard to answer as I am probably not aware of every security control, but selinux, apparmor, seccomp is restored as it was before.

If there is a way to tell the two kinds apart, I assume that some registries might need to code in special support.

At this point we just use a really simple image with a single layer and a couple of annotations. We are currently trying to come up with a standard: opencontainers/image-spec#962

Let's mention (or document) that the image format is pre-alpha, or something akin to that.

It seems to me that we should use a custom mediaType for these checkpoints, and that there are some potential interesting issues with allowing container runtimes to load images like this.

If the existing runc and crun support doesn't distinguish by media type, then I think we should warn readers that ~~these~~ some container runtimes are shipping the new feature before a security review.

I do not understand your comment about the mediaType. runc/crun just works with a directory which is specified by Podman/CRI-O/containerd. So I guess the mediaType is something on that level, but not sure what your comment means in the context of this post. Is this something for the image-spec discussion?

{ "schemaVersion": 2, "manifests": [ { "mediaType": "application/vnd.oci.image.manifest.v1+json", "digest": "sha256:2cd9b96662de0ea7defcd950ca12d04e080c57781cbd1bcf26ce522ba8313daa", "size": 563 } ] }

That mediaType. We should warn readers that our tooling (I'm not sure which bit) doesn't yet differentiate between actual container images, and checkpoint snapshots.

Somehow, I'm not sure how, runc / containerd / kubelet / ??? seems to have a way to tell the difference between unpacking a plain container image, and unpacking / processing a snapshot. An attacker might make a “container image”, let's say they call it nginx:latest, that is actually a snapshot. Perhaps they are able to find a bug in CRIU and exploit that, even where the person who specified the Pod didn't intend to run any snapshot at all.

Let's communicate that this represents an unmeasured risk.

Thanks for the explanation.

Somehow, I'm not sure how, runc / containerd / kubelet / ??? seems to have a way to tell the difference between unpacking a plain container image, and unpacking / processing a snapshot.

The code in Podman and CRI-O is first looking for annotations and if the annotation is present the image is unpacked and passed to CRIU.

An attacker might make a “container image”, let's say they call it nginx:latest, that is actually a snapshot. Perhaps they are able to find a bug in CRIU and exploit that, even where the person who specified the Pod didn't intend to run any snapshot at all.

Understood. Could be a way to exploit this right. It is important to remember that we have this situation for a long time. containerd supports their own checkpoint images for a "long" time, so this is something which could have happened before and not something completely new. Just mentioning it also here, we could not use the containerd checkpoint image format, because it includes containerd internal protobuf blobs which are not useful outside of containerd. That is the reason we started the image-spec discussion.

Let's communicate that this represents an unmeasured risk.

I will add something to the document to highlight this. At this point CRIU support needs to be manually enabled in CRI-O so that it would make sense to highlight this at the point where I am talking about how to enable it in CRI-O.

adrianreber · 2022-10-24T16:16:10Z

Updated as suggested.

sftim · 2022-10-24T16:35:01Z

We usually publish articles on weekdays. How about 2022-10-31?

adrianreber · 2022-10-24T16:43:35Z

We usually publish articles on weekdays. How about 2022-10-31?

Sure. I just picked a random date in the future. No preference from my side.

adrianreber · 2022-10-25T08:26:21Z

Changed date to 2022-10-31 and added a sentence about pre-alpha status of the image format.

adrianreber · 2022-10-25T12:43:44Z

@sftim PTAL, I added a sentence that the security implications are not completely understood.

sftim · 2022-10-25T13:03:38Z

/hold cancel

sftim

Thanks. Some more feedback.

content/en/blog/_posts/2022-10-31-forensic-container-checkpointing/index.md

sftim · 2022-10-25T13:09:29Z

content/en/blog/_posts/2022-10-31-forensic-container-checkpointing/index.md

+have `curl` trust the kubelet's CA certificate:
+
+```shell
+-k --cert /var/run/kubernetes/client-admin.crt --key /var/run/kubernetes/client-admin.key


-k does not trust the cert, it instead skips a security check. Omit -k here (and add other options around the CA certificate), or reword to explain what's going on.

adrianreber · 2022-10-25T14:00:13Z

Applied your suggestions and tried to address your other comments.

sftim · 2022-11-29T22:13:57Z

Please update the article with a tentative publication date: the 5th of December. Let's see if we can get it published on that day.

occase · 2022-11-30T00:55:45Z

I think when reading the first sentence --
Forensic container checkpointing allows the creation of stateful copies of a running container without the container knowing that it is being checkpointed.

I have to look further down into the article to find this --
Forensic container checkpointing is based on [Checkpoint/Restore In Userspace](https://criu.org/) (CRIU).

which I think should be the first sentence in this article.

Not sure if this has been checked for technical accuracy but the rest of the article looks ok (without checking for technical accuracy myself).

sftim · 2022-11-30T13:10:31Z

Thanks @occase

adrianreber · 2022-11-30T13:10:54Z

@haircommander I removed --drop-infra-ctr=false from the article. The corresponding PR (cri-o/cri-o#6379) is only blocked by the broken CI so far. Let me know if you are okay with removing your hold.

@sftim Release data updated

@occase Happy to do the change, but so far @sftim has been doing most stylistic changes, so let's see what he says.

sftim · 2022-11-30T13:12:16Z

/hold cancel

/lgtm

We might want to mention some of the alternatives (eg: whole-node snapshots, perhaps with the help of a specialized hypervisor, or even dedicated hardware). Not essential though.

k8s-ci-robot · 2022-11-30T13:12:23Z

LGTM label has been added.

Git tree hash: 870c6f73784f1ea51014b74899624949d7bb842e

sftim · 2022-11-30T13:24:14Z

I would make the change that @occase suggests, or something broadly like it.

adrianreber · 2022-11-30T13:28:38Z

Changed as suggested by @occase

content/en/blog/_posts/2022-12-05-forensic-container-checkpointing/index.md

Co-authored-by: Radostin Stoyanov <rstoyanov@fedoraproject.org> Co-authored-by: Sascha Grunert <sgrunert@redhat.com> Co-authored-by: Tim Bannister <tim@scalefactory.com> Signed-off-by: Adrian Reber <areber@redhat.com>

rst0git

LGTM

sftim

#37412 (review) has an LGTM

/lgtm
/approve

Thank you!

k8s-ci-robot · 2022-12-02T16:49:55Z

LGTM label has been added.

Git tree hash: 74b58a5972a9637274908dbc455f94611d5aa69d

k8s-ci-robot · 2022-12-02T16:50:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rst0git, saschagrunert, sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~content/en/blog/OWNERS~~ [sftim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2022

k8s-ci-robot requested review from nate-double-u and sftim October 20, 2022 13:54

k8s-ci-robot added area/blog Issues or PRs related to the Kubernetes Blog subproject language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Oct 20, 2022

sftim reviewed Oct 21, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2022

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 0d4b69c to a8ef499 Compare October 24, 2022 13:21

sftim reviewed Oct 24, 2022

View reviewed changes

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from d72b1fc to 79d0993 Compare October 24, 2022 16:15

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 79d0993 to 2eeeb60 Compare October 25, 2022 08:25

hesch mentioned this pull request Oct 25, 2022

cri-o assumes criu binary is in /usr/sbin/criu cri-o/cri-o#6337

Closed

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 2eeeb60 to 413c09c Compare October 25, 2022 12:42

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2022

sftim reviewed Oct 25, 2022

View reviewed changes

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 049ddfe to 861b285 Compare October 25, 2022 13:58

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch 2 times, most recently from 9dbe8be to 5f5364f Compare October 25, 2022 14:56

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from f7c7bb9 to d1cd8d2 Compare November 30, 2022 13:08

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 30, 2022

k8s-ci-robot assigned sftim Nov 30, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2022

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from d1cd8d2 to 6fb7ee6 Compare November 30, 2022 13:28

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2022

k8s-ci-robot requested a review from sftim November 30, 2022 13:28

rst0git reviewed Nov 30, 2022

View reviewed changes

content/en/blog/_posts/2022-12-05-forensic-container-checkpointing/index.md Outdated Show resolved Hide resolved

rst0git reviewed Nov 30, 2022

View reviewed changes

content/en/blog/_posts/2022-12-05-forensic-container-checkpointing/index.md Outdated Show resolved Hide resolved

rst0git reviewed Nov 30, 2022

View reviewed changes

content/en/blog/_posts/2022-12-05-forensic-container-checkpointing/index.md Show resolved Hide resolved

rst0git reviewed Nov 30, 2022

View reviewed changes

content/en/blog/_posts/2022-12-05-forensic-container-checkpointing/index.md Outdated Show resolved Hide resolved

Add blog post about forensic container checkpointing

0abb7e7

Co-authored-by: Radostin Stoyanov <rstoyanov@fedoraproject.org> Co-authored-by: Sascha Grunert <sgrunert@redhat.com> Co-authored-by: Tim Bannister <tim@scalefactory.com> Signed-off-by: Adrian Reber <areber@redhat.com>

adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 1786930 to 0abb7e7 Compare November 30, 2022 18:48

rst0git approved these changes Dec 2, 2022

View reviewed changes

sftim approved these changes Dec 2, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2022

k8s-ci-robot merged commit 613aae0 into kubernetes:main Dec 2, 2022

This comment was marked as off-topic.

Sign in to view

adrianreber mentioned this pull request Oct 19, 2023

Forensic Container Checkpointing kubernetes/enhancements#2008

Open

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blog post about forensic container checkpointing #37412

Add blog post about forensic container checkpointing #37412

adrianreber commented Oct 20, 2022

netlify bot commented Oct 20, 2022 •

edited

sftim left a comment

adrianreber commented Oct 24, 2022

sftim left a comment

sftim Oct 24, 2022

adrianreber Oct 24, 2022

sftim Oct 24, 2022

sftim Oct 24, 2022 •

edited

adrianreber Oct 25, 2022

sftim Oct 25, 2022 •

edited

adrianreber Oct 25, 2022

adrianreber commented Oct 24, 2022

sftim commented Oct 24, 2022

adrianreber commented Oct 24, 2022

adrianreber commented Oct 25, 2022

adrianreber commented Oct 25, 2022

sftim commented Oct 25, 2022

sftim left a comment

sftim Oct 25, 2022

adrianreber commented Oct 25, 2022

sftim commented Nov 29, 2022

occase commented Nov 30, 2022

sftim commented Nov 30, 2022

adrianreber commented Nov 30, 2022

sftim commented Nov 30, 2022

k8s-ci-robot commented Nov 30, 2022

sftim commented Nov 30, 2022

adrianreber commented Nov 30, 2022

rst0git left a comment

sftim left a comment

k8s-ci-robot commented Dec 2, 2022

k8s-ci-robot commented Dec 2, 2022

This comment was marked as off-topic.

Add blog post about forensic container checkpointing #37412

Add blog post about forensic container checkpointing #37412

Conversation

adrianreber commented Oct 20, 2022

netlify bot commented Oct 20, 2022 • edited

✅ Pull request preview available for checking

sftim left a comment

Choose a reason for hiding this comment

adrianreber commented Oct 24, 2022

sftim left a comment

Choose a reason for hiding this comment

sftim Oct 24, 2022

Choose a reason for hiding this comment

adrianreber Oct 24, 2022

Choose a reason for hiding this comment

sftim Oct 24, 2022

Choose a reason for hiding this comment

sftim Oct 24, 2022 • edited

Choose a reason for hiding this comment

adrianreber Oct 25, 2022

Choose a reason for hiding this comment

sftim Oct 25, 2022 • edited

Choose a reason for hiding this comment

adrianreber Oct 25, 2022

Choose a reason for hiding this comment

adrianreber commented Oct 24, 2022

sftim commented Oct 24, 2022

adrianreber commented Oct 24, 2022

adrianreber commented Oct 25, 2022

adrianreber commented Oct 25, 2022

sftim commented Oct 25, 2022

sftim left a comment

Choose a reason for hiding this comment

sftim Oct 25, 2022

Choose a reason for hiding this comment

adrianreber commented Oct 25, 2022

sftim commented Nov 29, 2022

occase commented Nov 30, 2022

sftim commented Nov 30, 2022

adrianreber commented Nov 30, 2022

sftim commented Nov 30, 2022

k8s-ci-robot commented Nov 30, 2022

sftim commented Nov 30, 2022

adrianreber commented Nov 30, 2022

rst0git left a comment

Choose a reason for hiding this comment

sftim left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 2, 2022

k8s-ci-robot commented Dec 2, 2022

This comment was marked as off-topic.

netlify bot commented Oct 20, 2022 •

edited

sftim Oct 24, 2022 •

edited

sftim Oct 25, 2022 •

edited