Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runc symlink for rancher rke2 #2764

Merged
merged 1 commit into from
May 28, 2024
Merged

Conversation

matthyx
Copy link
Contributor

@matthyx matthyx commented Apr 24, 2024

Add runc symlink for rancher rke2

Now that we support symlink resolution in runtime path, we can support Rancher rke2 out of the box (similar to what we have for k3s).

How to use

Install rancher https://docs.rke2.io/install/quickstart and try tracing without specifying any runc path.

Testing done

Compiled ig and successfully ran some tracers on rke2.

@matthyx matthyx requested a review from alban as a code owner April 24, 2024 10:07
pkg/container-hook/tracer.go Outdated Show resolved Hide resolved
Comment on lines 130 to 131
"/usr/bin/conmon",
"/var/lib/rancher/k3s/data/current/bin/runc",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should belong to a specific commit, as I think we forgot them in previous modifications?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also wondering if we should not unify these two arrays into one which would be used by these two files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for @alban and if OK will refactor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already opened this PR with regard to refactoring:
#2766
In any case, we have several levels of refactoring and the code itself can be shared between the two packages, but I am not 100% sure for the arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, so I just close this one and you add the new symlinked path to your PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for being reactive Francis!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, so I just close this one and you add the new symlinked path to your PR?

Please, leave yours open for now as I cannot give you any ETA for when mine will be merged.

thanks for being reactive Francis!

You are welcome.

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I tested it locally but rke2 seems not able to use container-hook and/or runc-fanotify:

$ ./rke2
...
INFO[0131] Tunnel authorizer set Kubelet Port 10250 
# No events are reported with latest:
$ kubectl logs -n gadget gadget-ktwf6
time="2024-04-26T09:30:03Z" level=info msg="OS detected: Ubuntu 22.04.4 LTS"
time="2024-04-26T09:30:03Z" level=info msg="Kernel detected: 6.5.0-28-generic"
time="2024-04-26T09:30:03Z" level=info msg="Gadget Image: ghcr.io/inspektor-gadget/inspektor-gadget:latest"
...
time="2024-04-26T09:30:05Z" level=warning msg="Skip pod kube-system/rke2-snapshot-validation-webhook-54c5989b65-sg2jp: cannot find container (ID: containerd://0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc): loading container with id \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\": container \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\" in namespace \"k8s.io\": not found"
$ ./kubectl-gadget trace exec -A                          (remotes/matthyx/patch-2) *%
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
^C%
# Same with your patch:
$ kubectl logs -n gadget gadget-tndjt                      (remotes/matthyx/patch-2) %
time="2024-04-26T09:49:10Z" level=info msg="OS detected: Ubuntu 22.04.4 LTS"
time="2024-04-26T09:49:10Z" level=info msg="Kernel detected: 6.5.0-28-generic"
time="2024-04-26T09:49:10Z" level=info msg="Gadget Image: francisrkeregistry.azurecr.io/gadget:HEAD"
...
time="2024-04-26T09:49:11Z" level=warning msg="Skip pod kube-system/rke2-snapshot-validation-webhook-54c5989b65-sg2jp: cannot find container (ID: containerd://0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc): loading container with id \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\": container \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\" in namespace \"k8s.io\": not found"
$ ./kubectl-gadget trace exec -A                           (remotes/matthyx/patch-2) %
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
^C%

Am I missing something?

Best regards.

@matthyx
Copy link
Contributor Author

matthyx commented Apr 26, 2024

Hi!

I tested it locally but rke2 seems not able to use container-hook and/or runc-fanotify:

$ ./rke2
...
INFO[0131] Tunnel authorizer set Kubelet Port 10250 
# No events are reported with latest:
$ kubectl logs -n gadget gadget-ktwf6
time="2024-04-26T09:30:03Z" level=info msg="OS detected: Ubuntu 22.04.4 LTS"
time="2024-04-26T09:30:03Z" level=info msg="Kernel detected: 6.5.0-28-generic"
time="2024-04-26T09:30:03Z" level=info msg="Gadget Image: ghcr.io/inspektor-gadget/inspektor-gadget:latest"
...
time="2024-04-26T09:30:05Z" level=warning msg="Skip pod kube-system/rke2-snapshot-validation-webhook-54c5989b65-sg2jp: cannot find container (ID: containerd://0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc): loading container with id \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\": container \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\" in namespace \"k8s.io\": not found"
$ ./kubectl-gadget trace exec -A                          (remotes/matthyx/patch-2) *%
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
^C%
# Same with your patch:
$ kubectl logs -n gadget gadget-tndjt                      (remotes/matthyx/patch-2) %
time="2024-04-26T09:49:10Z" level=info msg="OS detected: Ubuntu 22.04.4 LTS"
time="2024-04-26T09:49:10Z" level=info msg="Kernel detected: 6.5.0-28-generic"
time="2024-04-26T09:49:10Z" level=info msg="Gadget Image: francisrkeregistry.azurecr.io/gadget:HEAD"
...
time="2024-04-26T09:49:11Z" level=warning msg="Skip pod kube-system/rke2-snapshot-validation-webhook-54c5989b65-sg2jp: cannot find container (ID: containerd://0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc): loading container with id \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\": container \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\" in namespace \"k8s.io\": not found"
$ ./kubectl-gadget trace exec -A                           (remotes/matthyx/patch-2) %
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
^C%

Am I missing something?

Best regards.

can you log in debug and make sure we pick the right runc?

@eiffel-fl
Copy link
Member

Hi!
I tested it locally but rke2 seems not able to use container-hook and/or runc-fanotify:

$ ./rke2
...
INFO[0131] Tunnel authorizer set Kubelet Port 10250 
# No events are reported with latest:
$ kubectl logs -n gadget gadget-ktwf6
time="2024-04-26T09:30:03Z" level=info msg="OS detected: Ubuntu 22.04.4 LTS"
time="2024-04-26T09:30:03Z" level=info msg="Kernel detected: 6.5.0-28-generic"
time="2024-04-26T09:30:03Z" level=info msg="Gadget Image: ghcr.io/inspektor-gadget/inspektor-gadget:latest"
...
time="2024-04-26T09:30:05Z" level=warning msg="Skip pod kube-system/rke2-snapshot-validation-webhook-54c5989b65-sg2jp: cannot find container (ID: containerd://0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc): loading container with id \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\": container \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\" in namespace \"k8s.io\": not found"
$ ./kubectl-gadget trace exec -A                          (remotes/matthyx/patch-2) *%
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
^C%
# Same with your patch:
$ kubectl logs -n gadget gadget-tndjt                      (remotes/matthyx/patch-2) %
time="2024-04-26T09:49:10Z" level=info msg="OS detected: Ubuntu 22.04.4 LTS"
time="2024-04-26T09:49:10Z" level=info msg="Kernel detected: 6.5.0-28-generic"
time="2024-04-26T09:49:10Z" level=info msg="Gadget Image: francisrkeregistry.azurecr.io/gadget:HEAD"
...
time="2024-04-26T09:49:11Z" level=warning msg="Skip pod kube-system/rke2-snapshot-validation-webhook-54c5989b65-sg2jp: cannot find container (ID: containerd://0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc): loading container with id \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\": container \"0b7c4bb1015db08aef1943e3b136bd364663964bc1e14835fbfe19a058566bfc\" in namespace \"k8s.io\": not found"
$ ./kubectl-gadget trace exec -A                           (remotes/matthyx/patch-2) %
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
^C%

Am I missing something?
Best regards.

can you log in debug and make sure we pick the right runc?

Nevermind! We indeed pick it:

time="2024-04-26T10:28:33Z" level=debug msg="container-hook: trying runtime at /host/var/lib/rancher/rke2/data/v1.29.2-rke2r1-8bfebc2d9089/bin/runc"
time="2024-04-26T10:28:33Z" level=debug msg="container-hook: monitoring runtime at /host/var/lib/rancher/rke2/data/v1.29.2-rke2r1-8bfebc2d9089/bin/runc"

I can trace newly added pod with your changes:

$ kubectl run --restart=Never -ti --image=busybox mypod -- sh -c 'while /bin/true ; do whoami ; sleep 3 ; done'
If you don't see a command prompt, try pressing enter.
root
root
root
...
$ ./kubectl-gadget trace exec                              (remotes/matthyx/patch-2) %
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
pwmachine         default           mypod             mypod             129879    129434    true      sh       0   /bin/true             
pwmachine         default           mypod             mypod             129880    129434    whoami    sh       0   /bin/whoami           
pwmachine         default           mypod             mypod             129942    129434    true      sh       0   /bin/true             
pwmachine         default           mypod             mypod             129943    129434    whoami    sh       0   /bin/whoami

But not previous one:

$ ./kubectl-gadget trace exec -n kube-system
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS
# Nothing is printed

@matthyx
Copy link
Contributor Author

matthyx commented Apr 26, 2024

But not previous one:

$ ./kubectl-gadget trace exec -n kube-system
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS
# Nothing is printed

I wonder if the system pods are running from a different runc?

@eiffel-fl
Copy link
Member

But not previous one:

$ ./kubectl-gadget trace exec -n kube-system
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS
# Nothing is printed

I wonder if the system pods are running from a different runc?

Maybe,

But not previous one:

$ ./kubectl-gadget trace exec -n kube-system
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS
# Nothing is printed

I wonder if the system pods are running from a different runc?

Maybe, but I am rather wondering if this is a not a timing problem.
As I cannot trace the gadget namespace too.
Any idea @alban?

Comment on lines 208 to 210
runcPath, err := securejoin.SecureJoin(host.HostRoot, r)
if err != nil {
log.Debugf("Runcfanotify: securejoin failed: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, the error will already contain "SecureJoin":
https://github.com/cyphar/filepath-securejoin/blob/v0.2.4/join.go#L58
https://cs.opensource.google/go/go/+/refs/tags/go1.22.2:src/io/fs/fs.go;l=256

But not in all cases. So I wonder if we should repeat "securejoin" here. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is matching what we have in pkg/container-hook/tracer.go

@matthyx
Copy link
Contributor Author

matthyx commented May 24, 2024

@alban can we have a look at merging this, please?

@eiffel-fl
Copy link
Member

Hi!

@alban can we have a look at merging this, please?

I would like to first merge #2766 as it will ease merging this one.

Best regards.

@eiffel-fl
Copy link
Member

Hi!

I just merged #2766, can you please rebase?
Your commit should now just be a matter of adding the symlink here:

var RuntimePaths = []string{
"/bin/runc",
"/usr/bin/runc",
"/usr/sbin/runc",
"/usr/local/bin/runc",
"/usr/local/sbin/runc",
"/usr/lib/cri-o-runc/sbin/runc",
"/run/torcx/unpack/docker/bin/runc", // Used in Flatcar Container Linux
"/usr/bin/crun",
"/var/lib/rancher/k3s/data/current/bin/runc", // Used in k3s
}

Best regards.

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
@matthyx
Copy link
Contributor Author

matthyx commented May 28, 2024

thanks @eiffel-fl done!

Copy link
Member

@alban alban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it again and it works fine:

$ ./kubectl-gadget trace exec                             (remotes/matthyx/patch-2) *%
K8S.NODE          K8S.NAMESPACE     K8S.POD           K8S.CONTAINER     PID       PPID      COMM      PCOMM    RET ARGS                  
pwmachine         default           test-pod          test-pod          212820    51797     ls        bash     0   /usr/bin/ls
$ ../kubectl get node | awk '{ print $5 }'                (remotes/matthyx/patch-2) *%
VERSION
v1.29.2+rke2r1

Thank you for this contribution!

@eiffel-fl eiffel-fl merged commit 5287634 into inspektor-gadget:main May 28, 2024
57 checks passed
@matthyx matthyx deleted the patch-2 branch May 31, 2024 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants