Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/uprobetracer: support reading USDT arguments #2765

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

i-Pear
Copy link
Contributor

@i-Pear i-Pear commented Apr 24, 2024

pkg/uprobetracer: support reading USDT arguments

How to use

#include <gadget/usdt_argument.h>

usdt_get_argument(ctx, <arg_index>);

The usdt_get_argument function will return a __u64* pointer, it will be NULL on failure. Users can derefence it to get the value.

Also, the trace_gc gadget is added as a demo of using usdt_get_argument.

@i-Pear
Copy link
Contributor Author

i-Pear commented Apr 24, 2024

In this PR, I attempt to pass the information of the USDT arguments to the eBPF program, using rewriteConstant. However, I've realized that the same eBPF program might be attached to multiple containers, or multiple binaries within the same container. The USDT formats in these different files may be different.

This necessitates providing separate USDT information for each attached file. Or at the very least, we need to attach an additional piece of information to the eBPF program when AttachContainer is invoked, in order to distinguish between different ebpf instances. My idea is to write a unique ID into the cookie field each time it's attached, and establish an eBPF map to maintain per-instance information. What do you think about this approach? cc @alban @mauriciovasquezbernal

Comment on lines 218 to 222
l, err := ex.Uprobe(t.attachSymbol, t.prog,
&link.UprobeOptions{
Address: attachInfo.attachAddress,
RefCtrOffset: attachInfo.semaphoreAddress,
Cookie: realInodePtr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cilium/ebpf supports two Linux APIs to add a uprobe:

  1. Old API: tracefs
  2. New API: PMU (using unix.PerfEventOpen), available since Linux 4.17 (torvalds/linux@33ea4b2)

The attach cookie (torvalds/linux@82e6b1e) is only supported with the new API (Linux >=4.17).

But this is fine because the program with bpf_get_attach_cookie will be rejected in Linux < 5.15, so it won't be a silent failure.

@i-Pear i-Pear force-pushed the usdt_args branch 11 times, most recently from 328971b to 1c43bbd Compare May 3, 2024 10:02
@i-Pear i-Pear marked this pull request as ready for review May 3, 2024 10:03
@i-Pear i-Pear requested a review from alban May 3, 2024 10:03
@i-Pear i-Pear force-pushed the usdt_args branch 2 times, most recently from d76c2aa to 3cc3aca Compare May 3, 2024 10:57
include/gadget/usdt_argument.h Outdated Show resolved Hide resolved
Copy link
Member

@alban alban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good work but I have concerns about the growing API between ig and gadgets. See my comment suggesting bpf extensions.

gadgets/trace_gc/artifacthub-pkg.yml Outdated Show resolved Hide resolved
gadgets/trace_gc/gadget.yaml Outdated Show resolved Hide resolved
@@ -0,0 +1,64 @@
name: trace gc
description: use uprobe to trace garbage collection in Java and Python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to have a single gadget for both Java and Python. I guess when users want to debug the gc, they have a single application in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split this into two separate gadgets, or adding py_ and java_ according to the next comment. Which way do you like better?

Copy link
Contributor Author

@i-Pear i-Pear May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think for the image-based gadget, it lacks the ability to choose which tracepoints should be attached.
For example, in builtin-gadgets and bcc, we can write:

if args.trace_java {
  attach_java_tracepoints();
}
if args.trace_py {
  attach_py_tracepoints();
}

But in image-based gadgets, we have no choice but have to attach to all tracepoints. This is also un-flexible in the LSM strace gadget. In built-in gadgets, these are user-specific logics, but we don't have a way to write something like this in image-based gadgets.

So could we design an WASM api for this? Such as calling a wasm function called init_gadget (if existing), and it will return which tracepoints to be attached, maybe also which columns to be hidden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am divided on this point.
On one hand, I agree with Alban that I do not really see a use case where people would like to trace both at the same time.
On the other hand, having one tool to do everything seems also easier for user, as they do not have to think about which tool use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. I would suggest you do as it is easier to implement.

I have the same dilemna in https://github.com/alban/fsnotify-enricher where users can monitor either fanotify or inotify. I don't think users would want to do both. I chose to keep one gadget because there is a some shared ebpf code for each, so it is easier to maintain. But since I cannot disable an attachment if the user uses --fanotify-only, I just return early in the ebpf programs that are not useful. I can revisit this choice if ig gets this WASM API.

gadgets/trace_gc/program.bpf.c Outdated Show resolved Hide resolved
gadgets/trace_gc/program.bpf.c Outdated Show resolved Hide resolved
gadgets/trace_gc/program.bpf.c Outdated Show resolved Hide resolved
include/gadget/usdt_argument.h Outdated Show resolved Hide resolved
pkg/uprobetracer/tracer.go Outdated Show resolved Hide resolved
USDT_ARG_TYPE_MEM,
};

enum USDT_ARG_LENGTH {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you design this enum yourself or is it part of some USDT spec/convention?

Does the ebpf program need to know whether it is signed or unsigned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I designed them by myself.

You are right, the signed/unsigned is not important for ebpf parts. I can remove them from the enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec is here: https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

For example, the length property just has 8 possible choices. But no one have tried to encode this, there are also no such enums in other projects.

*/

encodeField := func(encodedValue *uint64, fieldOffset uint, fieldLength uint, fieldValue uint64) {
*encodedValue |= (fieldValue & (1<<(fieldLength+1) - 1)) << fieldOffset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is encoded by ig and decoded in include/gadget/usdt_argument.h, which will be compiled into third-party gadgets. So this encoding becomes parts of the ABI between ig and the gadget, and therefore it would need to be documented. I am concerned how we are increasing the API surface, and how difficult it would be to make changes. ig version n+1 would need to support gadgets compiled with include/gadget/usdt_argument.h from previous versions of ig.

I would like bpf extensions (BPF_PROG_TYPE_EXT, Linux >=5.6) to be used as a way to reduce the API surface between ig and gadgets. In this way, the API is reduced to the function usdt_get_argument(), and the ebpf maps (__usdt_args_buffer and __usdt_args_info) become implementation details of the ebpf extension, invisible to the gadget.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@i-Pear
Copy link
Contributor Author

i-Pear commented May 14, 2024

Seems the bpf extension needs some special steps before attaching. Trying to diff between https://github.com/cilium/ebpf/blob/main/link/tracing_test.go#L10

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

This looks promising, but I only took a slight look and I have plenty of comments.
Also, I would need to take another look, particularly as the golang code.

I would also like to have the commit history written correctly:

  1. One commit to add the eBPF code related to UDST functions.
  2. One commit for the golang code.
  3. Another to add the gadget itself.
    More commits would, of course, be welcomed as it eases the review.

Best regards.

Dockerfiles/ebpf-builder.Dockerfile Outdated Show resolved Hide resolved
Dockerfiles/ebpf-builder.Dockerfile Outdated Show resolved Hide resolved
Dockerfiles/ebpf-builder.Dockerfile Outdated Show resolved Hide resolved
@@ -0,0 +1,64 @@
name: trace gc
description: use uprobe to trace garbage collection in Java and Python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am divided on this point.
On one hand, I agree with Alban that I do not really see a use case where people would like to trace both at the same time.
On the other hand, having one tool to do everything seems also easier for user, as they do not have to think about which tool use.

gadgets/trace_gc/program.bpf.c Show resolved Hide resolved
pkg/uprobetracer/bpf/usdt_helper.bpf.c Outdated Show resolved Hide resolved
pkg/uprobetracer/bpf/usdt_helper.bpf.c Outdated Show resolved Hide resolved
pkg/uprobetracer/bpf/usdt_helper.bpf.c Outdated Show resolved Hide resolved
pkg/uprobetracer/bpf/usdt_helper.bpf.c Outdated Show resolved Hide resolved
Comment on lines +265 to +274
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(key_size, sizeof(u64));
__uint(value_size, sizeof(struct __usdt_arguments));
__uint(max_entries, 10000);
} __usdt_args_info SEC(".maps");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this map is storing the ABI, why do you need 10000 entries then? One would do the job.

Copy link
Contributor Author

@i-Pear i-Pear May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map was designed to be global, and would be shared across all instances. The key was an unique ID for each attachment.

But now, with the bpf extension support, I'm able to rewrite constants for each extension, so this map will be no longer needed. Will remove it soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realized that I'm using AttachUprobeMulti to attach the same program to multiple USDT tracepoints.

If I use rewriteConstant to write data directly into the extension program, I would need to load multiple programs and perform freplace individually for each, which would put a significant burden on the verifier. Therefore, I will do freplace only once and use AttachUprobeMulti, with the data required to read the USDT parameters still stored in this map.

@i-Pear
Copy link
Contributor Author

i-Pear commented May 16, 2024

Sorry, I just pushed a nightly commit for backup, not expecting that the github inviting reviewers for this. I'm marking this as draft until I finally finish it.

@i-Pear i-Pear marked this pull request as draft May 16, 2024 14:23
@i-Pear i-Pear force-pushed the usdt_args branch 2 times, most recently from f67120c to 62d9940 Compare May 18, 2024 12:15
@i-Pear i-Pear marked this pull request as ready for review May 18, 2024 12:18
i-Pear and others added 3 commits May 18, 2024 20:22
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com>
Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com>
Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com>
Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I have some comments, I will take a deeper look and test it later.

Best regards.

{
u64 cookie = bpf_get_attach_cookie(ctx);
if (!cookie)
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has u64 as return type but you return a bool here.
Also the comment should be updated.

Comment on lines +50 to +73
registerEncodingX86 = map[string]uint64{
"zero": 0,
"r15": 1,
"r14": 2,
"r13": 3,
"r12": 4,
"rbp": 5,
"rbx": 6,
"r11": 7,
"r10": 8,
"r9": 9,
"r8": 10,
"rax": 11,
"rcx": 12,
"rdx": 13,
"rsi": 14,
"rdi": 15,
"orig_ax": 16,
"rip": 17,
"cs": 18,
"flags": 19,
"rsp": 20,
"ss": 21,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have this defined in a arch specific file.

Comment on lines +181 to +200
switch argLength {
case 1:
argLength = 0
case -1:
argLength = 1
case 2:
argLength = 2
case -2:
argLength = 3
case 4:
argLength = 4
case -4:
argLength = 5
case 8:
argLength = 6
case -8:
argLength = 7
default:
return 0
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this code.

case "arm64":
registerEncoding = registerEncodingArm64
default:
return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return an error and not 0.


parts := strings.Split(arg, "@")
if len(parts) != 2 {
return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning 0, you should definitely return an error.

Comment on lines +392 to +408
if usdtArgsBufferMap != nil {
return usdtArgsBufferMap, nil
}

usdtArgsBufferMapSpec := ebpf.MapSpec{
Name: UsdtArgsBufferMapName,
Type: ebpf.PerCPUArray,
KeySize: 4,
ValueSize: 8,
MaxEntries: 1,
}
var err error
usdtArgsBufferMap, err = ebpf.NewMap(&usdtArgsBufferMapSpec)
if err != nil {
return nil, fmt.Errorf("creating USDT arguments buffer map: %w", err)
}
return usdtArgsBufferMap, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should not have a sync.Once() for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned sync.Once(), but it accepts a lambda so it's hard to get the err message if the map fails to be created?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could declare the err variable outside of the lambda to capture it.

Example of code doing that:

onceHostPidNs sync.Once
isHostPidNs bool
errHostPidNs error

Are you suggesting to use a single maps for the gadget, or for all gadgets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to use a single maps for the gadget, or for all gadgets?

I'm using one map for all the gadgets. The map should be loaded with the ebpf extension, not the gadget. If we use individual maps for each gadget, we will need to load the extension for N times, and it will be much slower.

@@ -159,6 +164,16 @@ func (t *Tracer[Event]) AttachProg(progName string, progType ProgType, attachTo
t.attachSymbol = parts[1]
t.prog = prog

// for USDT programs using `usdt_get_argument`, load extension to replace the impl
if t.progType == ProgUSDT && hasUsdtArgsFunction(t.prog) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have a ProgUSDT but hasUsdtArgsFunction() returns false?
Also, should we catch the error returned by the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have a ProgUSDT but hasUsdtArgsFunction() returns false?

In that case we don't need to load the extension.

Also, should we catch the error returned by the function?

The hasUsdtArgsFunction function does not return an error. If an error happens, we just will not load the extension.

return 0;

// USDT prototype:
// gc__start (int generation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this link as a comment in the eBPF code?

Comment on lines +135 to +140
// mem__pool__gc__begin (char* manager_name, int manager_name_len,
// char* poll_name, int poll_name_len,
// long initial_size, long memory_in_use,
// long committed_pages, long maximum_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.


// For USDT programs, we need to add the BPF_TRACE_UPROBE_MULTI flag
for _, p := range spec.Programs {
if p.Type == ebpf.Kprobe && strings.HasPrefix(p.SectionName, usdtPrefix) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand how it could be possible to have a kprobe with the USDT prefix, can you please shed some light?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both u(ret)probe(multi)s and k(ret)probe(multi)s are classified to ebpf.Kprobe type [2]. The condition is actually copied from here [1].

[1] https://github.com/inspektor-gadget/inspektor-gadget/blob/main/pkg/operators/ebpf/attach.go#L42
[2] https://ebpf-go.dev/concepts/section-naming/#program-sections

@@ -178,6 +181,14 @@ func (i *ebpfInstance) loadSpec() error {
if err != nil {
return fmt.Errorf("loading spec: %w", err)
}

// For USDT programs, we need to add the BPF_TRACE_UPROBE_MULTI flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cilium/ebpf automatically adds the MULTI flag when the section is named "uprobe.multi":
https://github.com/cilium/ebpf/blob/v0.15.0/elf_sections.go#L21-L22

Unfortunately, it does not have a similar section name for usdt. Do you think that should be something to improve in cilium/ebpf and libbpf? In this way, you would not need to manually change the AttachType and you would not need to redefine the constant BPF_TRACE_UPROBE_MULTI.

Meanwhile, this is fine for this PR, since there is no other choices with the current cilium/ebpf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is cilium/ebpf's issue. Here I add the BPF_TRACE_UPROBE_MULTI flag, only because I'm using a trick to deal with USDT tracepoints which are with the same name.

For example, there are many trace_gc USDT probes in Libjvm, and I need to attach to all of them. Using BPF_TRACE_UPROBE_MULTI will make it faster. Also, I don't want the uprobetracer be aware of this and support nested Links. Using BPF_TRACE_UPROBE_MULTI will merge many ebpf.Links into one, and I think that's more elegant.

@@ -56,6 +56,8 @@ const (
ParamTraceKernel = "trace-pipe"

kernelTypesVar = "kernelTypes"

BPF_TRACE_UPROBE_MULTI ebpf.AttachType = 48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining where this value comes from?

The constant is declared in cilium/ebpf but it is internal:
https://github.com/cilium/ebpf/blob/v0.15.0/internal/sys/types.go#L67
It is part of the Linux API:
https://github.com/torvalds/linux/blob/v6.8/include/uapi/linux/bpf.h#L1049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants