Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in agent's netns topology probe on k8s #2365

Open
waterjiao opened this issue Mar 16, 2021 · 4 comments
Open

Errors in agent's netns topology probe on k8s #2365

waterjiao opened this issue Mar 16, 2021 · 4 comments

Comments

@waterjiao
Copy link

Hello

I used the master version, and I'm running skydive on k8s v0.19.0.

Env:

host: CentOS7
container: ubuntu20.04

My config is---skydive.yaml---skydive agent ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: skydive-agent
  name: skydive-agent-config
data:
  SKYDIVE_AGENT_TOPOLOGY_PROBES: runc docker
  SKYDIVE_AGENT_LISTEN: 127.0.0.1:8081
  SKYDIVE_AGENT_TOPOLOGY_NETNS_RUN_PATH: /host/run

When I add network namespace on host(Centos7)

# ip netns add net1

Here's the skydive agent log:

2021-03-06T06:56:44.413Z  DEBUG  netns/netns.go:133 (*ProbeHandler).Register host2: Register network namespace: /host/run/net1
2021-03-06T06:56:50.125Z  ERROR  netns/netns.go:307 (*ProbeHandler).start host2: Failed to register namespace: /host/run/net1. All attempts fail:
#1: /host/run/net1 does not seem to be a valid namespace
#2: /host/run/net1 does not seem to be a valid namespace
#3: /host/run/net1 does not seem to be a valid namespace
#4: /host/run/net1 does not seem to be a valid namespace
...

Note the /host/run/net1 does not seem to be a valid namespace errors which means /host/run/net1 's device number is same with /host/run 's device number.

Code is:

if parent := filepath.Dir(path); parent != "" {
	if err := syscall.Stat(parent, &parentStats); err == nil {
		if stats.Dev == parentStats.Dev {
			return fmt.Errorf("%s does not seem to be a valid namespace", path)
		}
	}
}

I use stat command to check this:

in host:

# stat --format=%d /var/run/netns
22
# stat --format=%d /var/run/netns/net1
3

but in agent pod(container):

# stat --format=%d /host/run
22
# stat --format=%d /host/run/net1
22

Note net1's device number is different in host and pod.

It's tricky to debug. Has anyone encountered such a problem before?

Thanks

@lebauce
Copy link
Member

lebauce commented Mar 30, 2021

Hello. We did encounter such bugs some time ago but it was supposed to be fixed :-)

The reason for the check is the "ip netns" just creates a regular file for the new namespace then quick creates a bind mount from the namespace file in /proc to the regular file.

I'll try to reproduce the problem - pretty tricky to debug indeed - and I'll keep you updated

@lebauce
Copy link
Member

lebauce commented Mar 30, 2021

Did you use the Kubernetes template in contrib/kubernetes ? It specifies to use hostPID: true

@waterjiao
Copy link
Author

Sorry for taking so long to answer.

Yes, I used the Kubernetes template in contrib/kubernetes.

hostPID: true
hostNetwork: true

I did try to config more pod security policy.
This is my config:

hostPID: true
hostNetwork: true
hostIPC: true

securityContext:
  privileged: true
  runAsUser: 0
  allowPrivilegeEscalation: true

It didn't work.

I also try on centos(host) with docker container, get the same issue.

env:

host: centos7
container: centos7

When I run docker container:

docker run -it --privileged -v /var/run/netns:/host/run docker.io/centos /bin/bash

When I add network namespace on host(Centos7)

# ip netns add net1

I use stat command to check this:

in host:

# stat --format=%d /var/run/netns
22
# stat --format=%d /var/run/netns/net1
3

but in container:

# stat --format=%d /host/run
22
# stat --format=%d /host/run/net1
22

Note net1's device number is different in host and container.

@waterjiao waterjiao reopened this Apr 9, 2021
@lebauce
Copy link
Member

lebauce commented Jul 17, 2021

@waterjiao Hello. Sorry for the long delay.

On my CentOS 7 VM, I have the same results in the container that in the host. What storage driver are you using ? Is it overlayfs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants