Does eBPF work inside container-mode? #989

charmoniumQ · 2024-02-12T08:15:33Z

Use case: I wanted to benchmark an application in a normal system and one with eBPF filter on kernel tracepoints. Is this possible in container-mode?

I wrote an eBPF/bpftrace program which works as a normal user through setuid magic outside the container, but it gives the following error if I run it with containerexec:

ERROR: tracepoint not found: syscalls:sys_enter_fork

I think that is actually a permission error. If bpftrace doesn't have the root ruid and euid, /sys/kernel/tracing will not show any tracepoints. Fakeroot doesn't cut it.

I'm by no means an expert in Linux namespaces, I think we would want to add an opt-in flag to benchexec that adds a mapping from root (uid=0) outside the container to root (uid=0) inside the container to /proc/$benchexec/uid_map. I can implement it on my own, but I wanted to hear if I am on the right path from someone who understands namespaces better.

The text was updated successfully, but these errors were encountered:

PhilippWendler · 2024-02-12T14:31:34Z

I don't know about eBPF. But if it requires full root, i.e., the same as being uid 0 outside the container, then it will not work.

If it requires only root inside the container (or some capability like CAP_SYS_ADMIN, then it may work with containerexec --root. If it is supposed to work inside containers but does not work even with containerexec --root, then we could investigate what it actually needs and what is preventing it from working.

If you know that it requires full root, giving root inside the container access to the full root outside the container using uid_map would technically work, but opens up problems.

Using uid_map would require to execute BenchExec as root. But it was written with the intention of running as a regular user, and in particular the containerization used by BenchExec assumes that. I do not know whether running BenchExec as root would keep its isolation promises or whether it would open up security holes.

Giving full root access to inside the container would of course completely eliminate any isolation promises.

So I am hesitant to consider this.

Are there no other solutions for you? For example, setup tracing outside the container and then run BenchExec?

charmoniumQ · 2024-02-12T17:57:51Z

So I am hesitant to consider this.

Understood.

For example, setup tracing outside the container and then run BenchExec?

Yeah, I would just need to know the PID of the grandchild in the outside-of-BenchExec namespace (the PID inside BenchExec's namespace is always 2). I think I could change parent_setup_fn to take a kwarg specifying that pid. I will change ContainerExecutor and BaseExecutor to both pass a pid to parent_setup_fn, for consistencies sake. As in ContainerExecutor's case, BaseExecutor should wait for a byte signalling that the parent_setup_fn is complete before launching the tool. The pid will be passed to parent_setup_fn as a kwarg, so existing code may have to change a little, but they would be more future-proof if they soak up and ignore extra **kwargs.

What do you think of that?

PhilippWendler added the container related to container mode label Feb 12, 2024

charmoniumQ mentioned this issue Feb 12, 2024

Add tool_pid argument to parent_setup_fn #990

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does eBPF work inside container-mode? #989

Does eBPF work inside container-mode? #989

charmoniumQ commented Feb 12, 2024

PhilippWendler commented Feb 12, 2024

charmoniumQ commented Feb 12, 2024

Does eBPF work inside container-mode? #989

Does eBPF work inside container-mode? #989

Comments

charmoniumQ commented Feb 12, 2024

PhilippWendler commented Feb 12, 2024

charmoniumQ commented Feb 12, 2024