Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blog post about forensic container checkpointing #37412

Conversation

adrianreber
Copy link
Contributor

This adds a blog article how to use forensic container checkpointing which was merged in Kubernetes v1.25.

kubernetes/enhancements#2008

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2022
@k8s-ci-robot k8s-ci-robot added area/blog Issues or PRs related to the Kubernetes Blog subproject language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Oct 20, 2022
@netlify
Copy link

netlify bot commented Oct 20, 2022

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 0abb7e7
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/6387a57d644f1c000867cfd8
😎 Deploy Preview https://deploy-preview-37412--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pull request.
I'm afraid the post-release blog articles for Kubernetes v1.25 are done and dusted.

You're welcome to write about this @adrianreber, but please adjust this to make it clear that the Kubernetes v1.25 released has already happened and is a past thing. I would focus on the feature and de-emphasize when the feature went alpha (you should mention that still, just not in the article title).

/hold

We shouldn't merge this as it stands.

If this feature has changed for Kubernetes v1.26, we could do this a post-release article for v.126, and also mention the latest changes.


I've added a bunch of inline feedback about wording.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2022
@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 0d4b69c to a8ef499 Compare October 24, 2022 13:21
@adrianreber
Copy link
Contributor Author

@sftim Thanks for the detailed feedback. I tried to change everything as you suggested.

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more suggestions.

Comment on lines 170 to 178
Kubernetes schedules the new Pod onto a node. The kubelet on that node
instructs the container runtime (CRI-O in this example) to create and start a
container based on an image specified as `registry/user/checkpoint-image:latest`.
CRI-O detects that `registry/user/checkpoint-image:latest`
is a reference to checkpoint data rather than a container image. Then,
instead of the usual steps to create and start a container,
CRI-O fetches the checkpoint data and restores the container from that
specified checkpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need any special support from the registry? If not, is there any risk that a malicious user creates an image that appears to be a normal app but is in fact a checkpointed container that is then able to bypass some security controls?

If there is a way to tell the two kinds apart, I assume that some registries might need to code in special support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need any special support from the registry?

No.

If not, is there any risk that a malicious user creates an image that appears to be a normal app but is in fact a checkpointed container that is then able to bypass some security controls?

Hard to answer as I am probably not aware of every security control, but selinux, apparmor, seccomp is restored as it was before.

If there is a way to tell the two kinds apart, I assume that some registries might need to code in special support.

At this point we just use a really simple image with a single layer and a couple of annotations. We are currently trying to come up with a standard: opencontainers/image-spec#962

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention (or document) that the image format is pre-alpha, or something akin to that.

Copy link
Contributor

@sftim sftim Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we should use a custom mediaType for these checkpoints, and that there are some potential interesting issues with allowing container runtimes to load images like this.

If the existing runc and crun support doesn't distinguish by media type, then I think we should warn readers that these some container runtimes are shipping the new feature before a security review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand your comment about the mediaType. runc/crun just works with a directory which is specified by Podman/CRI-O/containerd. So I guess the mediaType is something on that level, but not sure what your comment means in the context of this post. Is this something for the image-spec discussion?

Copy link
Contributor

@sftim sftim Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:2cd9b96662de0ea7defcd950ca12d04e080c57781cbd1bcf26ce522ba8313daa",
      "size": 563
    }
  ]
}

That mediaType. We should warn readers that our tooling (I'm not sure which bit) doesn't yet differentiate between actual container images, and checkpoint snapshots.

Somehow, I'm not sure how, runc / containerd / kubelet / ??? seems to have a way to tell the difference between unpacking a plain container image, and unpacking / processing a snapshot. An attacker might make a “container image”, let's say they call it nginx:latest, that is actually a snapshot. Perhaps they are able to find a bug in CRIU and exploit that, even where the person who specified the Pod didn't intend to run any snapshot at all.

Let's communicate that this represents an unmeasured risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

Somehow, I'm not sure how, runc / containerd / kubelet / ??? seems to have a way to tell the difference between unpacking a plain container image, and unpacking / processing a snapshot.

The code in Podman and CRI-O is first looking for annotations and if the annotation is present the image is unpacked and passed to CRIU.

An attacker might make a “container image”, let's say they call it nginx:latest, that is actually a snapshot. Perhaps they are able to find a bug in CRIU and exploit that, even where the person who specified the Pod didn't intend to run any snapshot at all.

Understood. Could be a way to exploit this right. It is important to remember that we have this situation for a long time. containerd supports their own checkpoint images for a "long" time, so this is something which could have happened before and not something completely new. Just mentioning it also here, we could not use the containerd checkpoint image format, because it includes containerd internal protobuf blobs which are not useful outside of containerd. That is the reason we started the image-spec discussion.

Let's communicate that this represents an unmeasured risk.

I will add something to the document to highlight this. At this point CRIU support needs to be manually enabled in CRI-O so that it would make sense to highlight this at the point where I am talking about how to enable it in CRI-O.

@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from d72b1fc to 79d0993 Compare October 24, 2022 16:15
@adrianreber
Copy link
Contributor Author

Updated as suggested.

@sftim
Copy link
Contributor

sftim commented Oct 24, 2022

We usually publish articles on weekdays. How about 2022-10-31?

@adrianreber
Copy link
Contributor Author

We usually publish articles on weekdays. How about 2022-10-31?

Sure. I just picked a random date in the future. No preference from my side.

@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 79d0993 to 2eeeb60 Compare October 25, 2022 08:25
@adrianreber
Copy link
Contributor Author

Changed date to 2022-10-31 and added a sentence about pre-alpha status of the image format.

@adrianreber
Copy link
Contributor Author

@sftim PTAL, I added a sentence that the security implications are not completely understood.

@sftim
Copy link
Contributor

sftim commented Oct 25, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2022
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Some more feedback.

have `curl` trust the kubelet's CA certificate:

```shell
-k --cert /var/run/kubernetes/client-admin.crt --key /var/run/kubernetes/client-admin.key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-k does not trust the cert, it instead skips a security check. Omit -k here (and add other options around the CA certificate), or reword to explain what's going on.

@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 049ddfe to 861b285 Compare October 25, 2022 13:58
@adrianreber
Copy link
Contributor Author

Applied your suggestions and tried to address your other comments.

@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch 2 times, most recently from 9dbe8be to 5f5364f Compare October 25, 2022 14:56
@sftim
Copy link
Contributor

sftim commented Nov 29, 2022

Please update the article with a tentative publication date: the 5th of December. Let's see if we can get it published on that day.

@occase
Copy link
Contributor

occase commented Nov 30, 2022

I think when reading the first sentence --
Forensic container checkpointing allows the creation of stateful copies of a running container without the container knowing that it is being checkpointed.

I have to look further down into the article to find this --
Forensic container checkpointing is based on [Checkpoint/Restore In Userspace](https://criu.org/) (CRIU).

which I think should be the first sentence in this article.

Not sure if this has been checked for technical accuracy but the rest of the article looks ok (without checking for technical accuracy myself).

@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from f7c7bb9 to d1cd8d2 Compare November 30, 2022 13:08
@sftim
Copy link
Contributor

sftim commented Nov 30, 2022

Thanks @occase

@adrianreber
Copy link
Contributor Author

@haircommander I removed --drop-infra-ctr=false from the article. The corresponding PR (cri-o/cri-o#6379) is only blocked by the broken CI so far. Let me know if you are okay with removing your hold.

@sftim Release data updated

@occase Happy to do the change, but so far @sftim has been doing most stylistic changes, so let's see what he says.

@sftim
Copy link
Contributor

sftim commented Nov 30, 2022

/hold cancel

/lgtm

We might want to mention some of the alternatives (eg: whole-node snapshots, perhaps with the help of a specialized hypervisor, or even dedicated hardware). Not essential though.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 30, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2022
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 870c6f73784f1ea51014b74899624949d7bb842e

@sftim
Copy link
Contributor

sftim commented Nov 30, 2022

I would make the change that @occase suggests, or something broadly like it.

@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from d1cd8d2 to 6fb7ee6 Compare November 30, 2022 13:28
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2022
@adrianreber
Copy link
Contributor Author

Changed as suggested by @occase

Co-authored-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Co-authored-by: Sascha Grunert <sgrunert@redhat.com>
Co-authored-by: Tim Bannister <tim@scalefactory.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber adrianreber force-pushed the 2022-10-20-forensic-container-checkpointing branch from 1786930 to 0abb7e7 Compare November 30, 2022 18:48
Copy link
Contributor

@rst0git rst0git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#37412 (review) has an LGTM

/lgtm
/approve

Thank you!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2022
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 74b58a5972a9637274908dbc455f94611d5aa69d

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rst0git, saschagrunert, sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2022
@k8s-ci-robot k8s-ci-robot merged commit 613aae0 into kubernetes:main Dec 2, 2022
@sftim

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/blog Issues or PRs related to the Kubernetes Blog subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants