Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to expose CRI engine configuration via NFD #1488

Open
fidencio opened this issue Dec 5, 2023 · 16 comments
Open

Add support to expose CRI engine configuration via NFD #1488

fidencio opened this issue Dec 5, 2023 · 16 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@fidencio
Copy link
Contributor

fidencio commented Dec 5, 2023

As a developer of Kata Containers, we'd like to rely on NFD to properly check whether containerd has the appropriate snapshotter set up in one (or more) specific nodes, so we can decide whether some of the Kata Containers drivers there, and also provide the user an appropriate rule to schedule their workloads.

The reasoning behind this is detecting:

  • devmapper snapshotter, for Firecracker
  • nydus snapshotter, for any Confidential Containers workload

While I know that the preferred way to deploy Kata Containers would be just baking it into the node image, we know that users trying out Kata Containers usually rely on our daemon-set, and then get confused on why a specific driver (VMM) doesn't work, as specific drivers require specific snapshotters.

@fidencio fidencio added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 5, 2023
@fidencio
Copy link
Contributor Author

fidencio commented Dec 5, 2023

cc @zvonkok @mythi @marquiz

@fidencio
Copy link
Contributor Author

fidencio commented Dec 5, 2023

The way I'd like to see this exposed is something like:
container-engine.containerd.snapshotter.devmapper or container-engine.containerd.snapshotter.nydus.

@ArangoGutierrez
Copy link
Contributor

So you are proposing a new feature source called container-engine, right?

@marquiz
Copy link
Contributor

marquiz commented Dec 5, 2023

We could probably put this under the system source.

Trying to understand how this would work (and the possible caveats and corner cases), we'd need to parse the containerd config, right(?) That should be usually readable by non-root. But the snapshotter can depend on the runtime class? Should we take that into account?

@ArangoGutierrez
Copy link
Contributor

I understand this is a containerd centric feature, but could we have a generic way to parse "container-runtime" config files, so we are future proof and add to this feature request {docker,podman,crio} ?

@fidencio
Copy link
Contributor Author

fidencio commented Dec 5, 2023

Trying to understand how this would work (and the possible caveats and corner cases), we'd need to parse the containerd config, right(?) That should be usually readable by non-root. But the snapshotter can depend on the runtime class? Should we take that into account?

containedconfiguration just tells whether a snapshotter is actually being used by a runtime class.
We want to know whether a snapshotter is in the system before we tie it to a runtime handler.

In a very hacky way, ctr plugin ls is what we want to check (and there's a way to import containerd client package and do it without having to call the tool), and there we can expose the snapshotters.

Here's the output, for instance:

TYPE                                  ID                       PLATFORMS      STATUS    
io.containerd.content.v1              content                  -              ok        
io.containerd.snapshotter.v1          aufs                     linux/amd64    skip      
io.containerd.snapshotter.v1          btrfs                    linux/amd64    ok        
io.containerd.snapshotter.v1          devmapper                linux/amd64    error     
io.containerd.snapshotter.v1          native                   linux/amd64    ok        
io.containerd.snapshotter.v1          overlayfs                linux/amd64    ok        
io.containerd.snapshotter.v1          zfs                      linux/amd64    skip      
io.containerd.metadata.v1             bolt                     -              ok        
io.containerd.differ.v1               walking                  linux/amd64    ok        
io.containerd.event.v1                exchange                 -              ok        
io.containerd.gc.v1                   scheduler                -              ok        
io.containerd.service.v1              introspection-service    -              ok        
io.containerd.service.v1              containers-service       -              ok        
io.containerd.service.v1              content-service          -              ok        
io.containerd.service.v1              diff-service             -              ok        
io.containerd.service.v1              images-service           -              ok        
io.containerd.service.v1              leases-service           -              ok        
io.containerd.service.v1              namespaces-service       -              ok        
io.containerd.service.v1              snapshots-service        -              ok        
io.containerd.runtime.v1              linux                    linux/amd64    ok        
io.containerd.runtime.v2              task                     linux/amd64    ok        
io.containerd.monitor.v1              cgroups                  linux/amd64    ok        
io.containerd.service.v1              tasks-service            -              ok        
io.containerd.grpc.v1                 introspection            -              ok        
io.containerd.internal.v1             restart                  -              ok        
io.containerd.grpc.v1                 containers               -              ok        
io.containerd.grpc.v1                 content                  -              ok        
io.containerd.grpc.v1                 diff                     -              ok        
io.containerd.grpc.v1                 events                   -              ok        
io.containerd.grpc.v1                 healthcheck              -              ok        
io.containerd.grpc.v1                 images                   -              ok        
io.containerd.grpc.v1                 leases                   -              ok        
io.containerd.grpc.v1                 namespaces               -              ok        
io.containerd.internal.v1             opt                      -              ok        
io.containerd.grpc.v1                 snapshots                -              ok        
io.containerd.grpc.v1                 tasks                    -              ok        
io.containerd.grpc.v1                 version                  -              ok        
io.containerd.tracing.processor.v1    otlp                     -              skip      
io.containerd.internal.v1             tracing                  -              ok  

I was talking to @mythi, and he mentioned he'd also like to check whether nri is enabled or not, which may be a second use case for this.

@marquiz
Copy link
Contributor

marquiz commented Dec 5, 2023

Mm, I immediately see two problems here. containerd.sock (and ctr) requires root access, plus our container base image is scratch and we only ship nfd binaries, nothing else (and going forward we'd prolly want to keep it that way)

@fidencio
Copy link
Contributor Author

fidencio commented Dec 5, 2023

Mm, I immediately see two problems here. containerd.sock (and ctr) requires root access, plus our container base image is scratch and we only ship nfd binaries, nothing else (and going forward we'd prolly want to keep it that way)

Just to make it clear, I'm not suggesting to ship ctr, but rather implement it on our end using the go package provided by containerd. Now, containerd.sock does require root, indeed. :-/

@zvonkok
Copy link
Contributor

zvonkok commented Dec 5, 2023

/cc @zvonkok

@zvonkok
Copy link
Contributor

zvonkok commented Dec 5, 2023

What about a side-car container that writes to /etc/kubernetes/node-feature-discovery/features.d` ? Like what GPU feature-discovery is doing?
Once you have NFD deployed, kata-deploy can deploy "anything" in a side-car container to detect "anything"?

@mythi
Copy link
Contributor

mythi commented Dec 7, 2023

What about a side-car container that writes to /etc/kubernetes/node-feature-discovery/features.d` ? Like what GPU feature-discovery is doing? Once you have NFD deployed, kata-deploy can deploy "anything" in a side-car container to detect "anything"?

Good point. a feature hook using the local source could probably work.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 5, 2024
@ArangoGutierrez
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 5, 2024
@ArangoGutierrez
Copy link
Contributor

/remove-lifecycle stale

@ArangoGutierrez
Copy link
Contributor

@fidencio @zvonkok @mythi is this topic still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

7 participants