Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make checkpoint image compatible with OCI spec #8661

Open
tianouya-db opened this issue Jun 8, 2023 · 16 comments
Open

Make checkpoint image compatible with OCI spec #8661

tianouya-db opened this issue Jun 8, 2023 · 16 comments

Comments

@tianouya-db
Copy link

tianouya-db commented Jun 8, 2023

What is the problem you're trying to solve

Today the checkpoint image generated by containerd (using ctr container checkpoint or ctr task checkpoint) has a special format which does not conform to OCI spec. As a result, the image is rejected by container registries (e.g. Harbor, dockerhub).

Specifically, the image index generated with ctr container checkpoint --image --rw --task is like this:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.runtime.options+proto",
      "digest": "sha256:eaa269d0484d8a346e084e255eff799efeace826895b9a0b8ccde42851a8cecf",
      "size": 32,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "digest": "sha256:1e5b537724aa619a2e4957bf112e553a2a8c7df809a40dc5c3a386edd7140d48",
      "size": 320
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:78a8e86786975bedf641fce5f5adeeb7308188a0749530099d98378c98c0c5be",
      "size": 1233379,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.containerd.container.criu.checkpoint.criu.tar",
      "digest": "sha256:129f4c0c80421225d6072a475e732ea4cfce383a00f38e8b8806b2aeedcfe50e",
      "size": 179200,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.config.v1+proto",
      "digest": "sha256:55ecceab992a45ebf873fb0cd4ddf3ef617483911a477a97858a1636601652f6",
      "size": 13856,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.options.v1+proto",
      "digest": "sha256:7d3151453569712d136c98a8cce017f677b6a39c5114c6f570c0ea2f5a4bce68",
      "size": 42,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    }
  ],
  "annotations": {
    "io.containerd.checkpoint.runtime": "io.containerd.runc.v2",
    "io.containerd.checkpoint.snapshotter": "overlayfs",
    "org.opencontainers.image.ref.name": "docker.io/library/busybox:latest"
  }
}

It violates OCI spec on the requirement that each object in the manifests array must be a descriptor of a manifest or another index.

Describe the solution you'd like

We propose a new format for the checkpoint image generated by containerd. The new format conforms better to OCI spec, so that it can be accepted by the container registries.

Image index

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      // base image manifest/index
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "digest": "sha256:1e5b537724aa619a2e4957bf112e553a2a8c7df809a40dc5c3a386edd7140d48",
      "size": 320,
      "annotations": {
        "io.containerd.checkpoint.content.group": "base"
      }
    },
    {
      // checkpoint image manifest
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:dc744ab1e0bb95b448c1c9881a72d521b626290606995a7bea5902a07457bb3f",
      "size": 1246,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "annotations": {
        "io.containerd.checkpoint.content.group": "checkpoint"
      }
    }
  ],
  "annotations": {
    "io.containerd.checkpoint.runtime": "io.containerd.runc.v2",
    "io.containerd.checkpoint.snapshotter": "overlayfs",
    "org.opencontainers.image.ref.name": "docker.io/library/busybox:latest",
    "io.containerd.checkpoint.image.version": "1"
  }
}

The first entry in manifests is the base image manifest or index, and the second entry is the manifest for the checkpoint content. We also propose these two annotations:

  • io.containerd.checkpoint.content.group: added on the manifest entries, value can be base or checkpoint
  • io.containerd.checkpoint.image.version: value to be 1. Versioning on the checkpoint images generated by containerd, to maintain backward compatibility.

Checkpoint manifest

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.containerd.container.checkpoint.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2
  },
  "layers": [
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.runtime.options+proto",
      "digest": "sha256:d8183a03f8f3429623e6aa55d13c70d1bfc282fe5c3d6562180fdc55c7614589",
      "size": 28
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:5d2658d1a60d482d801448be4273f36eb100f9200b7d853fd7a7ef9b3b7d849e",
      "size": 562159
    },
    {
      "mediaType": "application/vnd.containerd.container.criu.checkpoint.criu.tar",
      "digest": "sha256:5536b71f469c59440af8cc9456b8d4d212c37215c04639691c5eca5a552b6ae4",
      "size": 187392
    },
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.config.v1+proto",
      "digest": "sha256:f95684011fe7045e7d9d0751398123f98f845b8e43abb1e9213941dd3797d710",
      "size": 13869
    },
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.options.v1+proto",
      "digest": "sha256:7d3151453569712d136c98a8cce017f677b6a39c5114c6f570c0ea2f5a4bce68",
      "size": 42
    }
  ],
  "subject": {
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    "digest": "sha256:acaddd9ed544f7baf3373064064a51250b14cfe3ec604d65765a53da5958e5f5",
    "size": 528
  },
  "annotations": {
    "io.containerd.checkpoint.content.group": "checkpoint",
    "io.containerd.checkpoint.image.version": "1"
  }
}

The manifest is packaged as an OCI artifact, with these fields to call out:

  • artifactType: application/vnd.containerd.container.checkpoint.tar - the field is required for an OCI artifact, and the value is one we propose for containerd checkpoint
  • config - the value is a descriptor that points to a custom config with type application/vnd.containerd.container.checkpoint.v1+json that we propose for containerd, and we can use an empty payload to start with
  • layers - contains all the current layers generated for checkpoint image
  • subject - points to the manifest of the base image
  • annotations
    • io.containerd.checkpoint.content.group: checkpoint
    • io.containerd.checkpoint.image.version: 1

Changes required

The following functions will need to be updated:

  • checkpoint operation: generate checkpoint images in the new format
  • image push: work with the new format. The existing format can be disregarded, as it can't be pushed to a registry anyway.
  • image pull: work with the new format. The existing format can be disregarded, as it can't be pulled from a registry anyway.
  • container restore: work with the new format. If we want to maintain backward compatibility, we need to keep the support of the existing format. This can be done by leveraging the io.containerd.checkpoint.image.version annotation.

Additional context

@tianouya-db
Copy link
Author

cc: @sudo-bmitch, @cpuguy83, @dmcgowan

@cpuguy83
Copy link
Member

cpuguy83 commented Jun 8, 2023

Change SGTM
Ideally this would use referrers but that's a whole different change.

@sudo-bmitch
Copy link

sudo-bmitch commented Jun 8, 2023

My preference would be to define a custom config media type since you're able to define the other media types. The content of the config can initially match the empty descriptor that OCI is defining ({}), while also reserved for future use.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.containerd.container.checkpoint.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2
  },
  "layers": [
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.runtime.options+proto",
      "digest": "sha256:d8183a03f8f3429623e6aa55d13c70d1bfc282fe5c3d6562180fdc55c7614589",
      "size": 28
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:5d2658d1a60d482d801448be4273f36eb100f9200b7d853fd7a7ef9b3b7d849e",
      "size": 562159
    },
    {
      "mediaType": "application/vnd.containerd.container.criu.checkpoint.criu.tar",
      "digest": "sha256:5536b71f469c59440af8cc9456b8d4d212c37215c04639691c5eca5a552b6ae4",
      "size": 187392
    },
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.config.v1+proto",
      "digest": "sha256:f95684011fe7045e7d9d0751398123f98f845b8e43abb1e9213941dd3797d710",
      "size": 13869
    },
    {
      "mediaType": "application/vnd.containerd.container.checkpoint.options.v1+proto",
      "digest": "sha256:7d3151453569712d136c98a8cce017f677b6a39c5114c6f570c0ea2f5a4bce68",
      "size": 42
    }
  ],
  "annotations": {
    "io.containerd.checkpoint.content.group": "checkpoint",
    "io.containerd.checkpoint.image.version": "1"
  }
}

For the image index, make sure to define a mediaType application/vnd.oci.image.index.v1+json. If there is any desire to have a graceful fallback for clients to run that index that don't understand the checkpoint image entries, then you can copy the descriptors from the referenced image, listing them first so the platform matches on the image before it matches the checkpoint artifact:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      // base image manifest/index
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "digest": "sha256:1e5b537724aa619a2e4957bf112e553a2a8c7df809a40dc5c3a386edd7140d48",
      "size": 320,
      "annotations": {
        "io.containerd.image.name": "docker.io/library/busybox:latest",
        "org.opencontainers.image.ref.name": "latest",
        "io.containerd.checkpoint.content.group": "base"
      }
    },
    {
      // pull up amd64 descriptor from above manifest for fallback
      "digest": "sha256:5cd3db04b8be5773388576a83177aff4f40a03457a63855f4b9cbe30542b9a43",
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "size": 528,
      "annotations": {
        // adjust annotations as needed
        "io.containerd.image.name": "docker.io/library/busybox:latest",
        "org.opencontainers.image.ref.name": "latest",
        "io.containerd.checkpoint.content.group": "base"
      }
    },
    {
      // checkpoint image manifest
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:dc744ab1e0bb95b448c1c9881a72d521b626290606995a7bea5902a07457bb3f",
      "size": 1246,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "annotations": {
        "io.containerd.checkpoint.content.group": "checkpoint"
      }
    }
  ],
  "annotations": {
    "io.containerd.checkpoint.runtime": "io.containerd.runc.v2",
    "io.containerd.checkpoint.snapshotter": "overlayfs",
    "org.opencontainers.image.ref.name": "docker.io/library/busybox:latest",
    "io.containerd.checkpoint.image.version": "1"
  }
}

@tianouya-db
Copy link
Author

tianouya-db commented Jun 8, 2023

Thanks for the feedback.

My preference would be to define a custom config media type since you're able to define the other media types.

This makes sense to me.

For the image index, make sure to define a mediaType application/vnd.oci.image.index.v1+json.

Makes sense +1.

you can copy the descriptors from the referenced image

Would this cause problem since it's a duplicate manifest? Or maybe another option is to just expand the base image index and include the manifest directly?

@sudo-bmitch
Copy link

you can copy the descriptors from the referenced image

Would this cause problem since it's a duplicate manifest? Or maybe another option is to just expand the base image index and include the manifest directly?

We added this to the spec a while back and I believe it matches what implementations already do:

If multiple manifests match a client or runtime's requirements, the first matching entry SHOULD be used.

The idea being that your runtime knows to look for the checkpoint descriptor so it will keep searching past the first platform match until it sees the expected annotations. But older runtimes and other tooling would match on the container image.

This is only useful if the checkpoint index would ever be passed to something other than a checkpoint runtime. If that's not a possible use case, then skip that suggestion and keep it simple.

@tianouya-db
Copy link
Author

tianouya-db commented Jun 9, 2023

To clarify, do you suggest pull up the base image manifest to the index, and get rid of the base image index? If that's case then we are on the same page. It's something like:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      // pull up amd64 descriptor from the base image index
      "digest": "sha256:5cd3db04b8be5773388576a83177aff4f40a03457a63855f4b9cbe30542b9a43",
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "size": 528,
      "annotations": {
        // adjust annotations as needed
        "io.containerd.image.name": "docker.io/library/busybox:latest",
        "org.opencontainers.image.ref.name": "latest",
        "io.containerd.checkpoint.content.group": "base"
      }
    },
    {
      // checkpoint image manifest
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:dc744ab1e0bb95b448c1c9881a72d521b626290606995a7bea5902a07457bb3f",
      "size": 1246,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "annotations": {
        "io.containerd.checkpoint.content.group": "checkpoint"
      }
    }
  ],
  "annotations": {
    "io.containerd.checkpoint.runtime": "io.containerd.runc.v2",
    "io.containerd.checkpoint.snapshotter": "overlayfs",
    "org.opencontainers.image.ref.name": "docker.io/library/busybox:latest",
    "io.containerd.checkpoint.image.version": "1"
  }
}

This is only useful if the checkpoint index would ever be passed to something other than a checkpoint runtime.

I think in most (not sure if all) cases this would only be passed to a checkpoint runtime. We can keep it as an optimization that's nice to have, and we can always add it if it turns out to be necessary.

@sudo-bmitch
Copy link

To clarify, do you suggest pull up the base image manifest to the index, and get rid of the base image index? If that's case then we are on the same page. It's something like:

I left the base index in there too, to handle the cases where things are pinned to that digest.

@tianouya-db
Copy link
Author

on the same page. It's something like:

I left the base index in there too, to handle the cases where things are pinned to that digest.

This means runtime will get duplicate entries as it walk through all entries recursively. I'm not sure if that would cause issues in some runtimes, so I'll leave it open and see if anyone else has a feedback on it.

@tianouya-db
Copy link
Author

tianouya-db commented Jun 10, 2023

Also, one other question is on the artifactType field in manifest:

"artifactType": "application/vnd.containerd.container.checkpoint.tar",

Do we need to use the mediaType of one of the layers (which have different media types), or can we use a new one we propose?

@sudo-bmitch
Copy link

I left the base index in there too, to handle the cases where things are pinned to that digest.

This means runtime will get duplicate entries as it walk through all entries recursively. I'm not sure if that would cause issues in some runtimes, so I'll leave it open and see if anyone else has a feedback on it.

Here's an image to test, and see if you run busybox or alpine: ghcr.io/sudo-bmitch/oci-sandbox:recursive

Also, one other question is on the artifactType field in manifest:
...
Do we need to use the mediaType of one of the layers (which have different media types), or can we use a new one we propose?

artifactType's constraint is that it needs to follow the IANA media type syntax, ideally with a registered media type, or with something under the appropriate reverse DNS namespace. Beyond that, it's up to artifact producers to specify how they'll use it.

The reason we added artifactType was for scenarios where there was no config descriptor and adding one would be difficult. The simple example of that is an SBOM, where they have a media type for their data, but their spec does not define how to package that in OCI because their job is just to define the SBOM spec, so users have no content or media type for the config.

My fear in seeing how it's used is that people see a new feature and assume they must use every new feature available, turning on all the switches just because they exist, rather than because of a need. The result is an artifact that may be less portable (e.g. ECR will block a manifest with fields they don't recognize, and artifactType has not made it into a GA release). When possible, sticking to the OCI 1.0 spec gives better portability.

@tianouya-db
Copy link
Author

tianouya-db commented Jun 12, 2023

Thanks @sudo-bmitch. I looked at ghcr.io/sudo-bmitch/oci-sandbox:recursive, and seems it's not a recursive one? The manifests that are directly included at the top level index seem to be different from those in the inner index. e.g.

  • linux/amd64 manifest at the top level: 463751ca5f39b24eb7a1261b22a4603e386086dab88a74fab65d09e3025427be
  • linux/amd64 manifest inside the inner index: d4f87a3a8111d20fa5fb5920e1021733da13ecf5ef091ed258deff8f5e28a5d0

Regarding artifactType, I found this in the spec:

This MUST be set when config.mediaType is set to the empty value.

I included it originally because I was trying to use the empty config. Now that I use a custom type for the config, I assume the artifactType is not mandatory. I can remove it if that's the case.

@sudo-bmitch
Copy link

Thanks @sudo-bmitch. I looked at ghcr.io/sudo-bmitch/oci-sandbox:recursive, and seems it's not a recursive one? The manifests that are directly included at the top level index seem to be different from those in the inner index. e.g.

* linux/amd64 manifest at the top level: `463751ca5f39b24eb7a1261b22a4603e386086dab88a74fab65d09e3025427be`

* linux/amd64 manifest inside the inner index: `d4f87a3a8111d20fa5fb5920e1021733da13ecf5ef091ed258deff8f5e28a5d0`

Nested would have been a better tag than recursive. It's an index inside of an index, one image is busybox, the other is alpine. This is so that you can see if the runtime sees a conflict trying to run both, picks the one in the nested index, or never even queries the nested index when it is parsing the top level index.

I included it originally because I was trying to use the empty config. Now that I use a custom type for the config, I assume the artifactType is not mandatory. I can remove it if that's the case.

Yes.

@tianouya-db
Copy link
Author

This is so that you can see if the runtime sees a conflict trying to run both, picks the one in the nested index, or never even queries the nested index when it is parsing the top level index.

Yeah I tried with containerd and it seems to start busybox from what I can tell. I think a recursive index would probably also work. Still I think this could be an enhancement we can use in case we actually need it in the future. For now, I'm inclined to just keep things simple by just having the base image index inside the checkpoint image index. Let me know if that makes sense.

@dmcgowan
Copy link
Member

The new approach looks much cleaner. Related to what @cpuguy83 said about referrers, I think it might be interesting to link the checkpoint manifest to the base image via the new subject field in the manifest. The checkpoint index could still be created, but it seems like it would make sense to support pulling the base image via subject when provided. This would both allow us to use referrers in the future as well as continue to use checkpoint manifests directly from tags.

@tianouya-db
Copy link
Author

Makes sense. I've added the subject field in the checkpoint manifest and updated the description.

@liyimeng
Copy link

@tianouya-db seem everyone is happy with your proposal, are you going to make a PR for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants