-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/uprobetracer: support reading USDT arguments #2765
base: main
Are you sure you want to change the base?
Conversation
In this PR, I attempt to pass the information of the USDT arguments to the eBPF program, using This necessitates providing separate USDT information for each attached file. Or at the very least, we need to attach an additional piece of information to the eBPF program when |
pkg/uprobetracer/tracer.go
Outdated
l, err := ex.Uprobe(t.attachSymbol, t.prog, | ||
&link.UprobeOptions{ | ||
Address: attachInfo.attachAddress, | ||
RefCtrOffset: attachInfo.semaphoreAddress, | ||
Cookie: realInodePtr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cilium/ebpf supports two Linux APIs to add a uprobe:
- Old API: tracefs
- New API: PMU (using
unix.PerfEventOpen
), available since Linux 4.17 (torvalds/linux@33ea4b2)
The attach cookie (torvalds/linux@82e6b1e) is only supported with the new API (Linux >=4.17).
But this is fine because the program with bpf_get_attach_cookie
will be rejected in Linux < 5.15, so it won't be a silent failure.
328971b
to
1c43bbd
Compare
d76c2aa
to
3cc3aca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good work but I have concerns about the growing API between ig and gadgets. See my comment suggesting bpf extensions.
@@ -0,0 +1,64 @@ | |||
name: trace gc | |||
description: use uprobe to trace garbage collection in Java and Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to have a single gadget for both Java and Python. I guess when users want to debug the gc, they have a single application in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split this into two separate gadgets, or adding py_
and java_
according to the next comment. Which way do you like better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think for the image-based gadget, it lacks the ability to choose which tracepoints should be attached.
For example, in builtin-gadgets and bcc, we can write:
if args.trace_java {
attach_java_tracepoints();
}
if args.trace_py {
attach_py_tracepoints();
}
But in image-based gadgets, we have no choice but have to attach to all tracepoints. This is also un-flexible in the LSM strace
gadget. In built-in gadgets, these are user-specific logics, but we don't have a way to write something like this in image-based gadgets.
So could we design an WASM api for this? Such as calling a wasm function called init_gadget
(if existing), and it will return which tracepoints to be attached, maybe also which columns to be hidden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am divided on this point.
On one hand, I agree with Alban that I do not really see a use case where people would like to trace both at the same time.
On the other hand, having one tool to do everything seems also easier for user, as they do not have to think about which tool use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion. I would suggest you do as it is easier to implement.
I have the same dilemna in https://github.com/alban/fsnotify-enricher where users can monitor either fanotify or inotify. I don't think users would want to do both. I chose to keep one gadget because there is a some shared ebpf code for each, so it is easier to maintain. But since I cannot disable an attachment if the user uses --fanotify-only
, I just return early in the ebpf programs that are not useful. I can revisit this choice if ig gets this WASM API.
include/gadget/usdt_argument.h
Outdated
USDT_ARG_TYPE_MEM, | ||
}; | ||
|
||
enum USDT_ARG_LENGTH { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you design this enum yourself or is it part of some USDT spec/convention?
Does the ebpf program need to know whether it is signed or unsigned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I designed them by myself.
You are right, the signed/unsigned is not important for ebpf parts. I can remove them from the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec is here: https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
For example, the length property just has 8 possible choices. But no one have tried to encode this, there are also no such enums in other projects.
*/ | ||
|
||
encodeField := func(encodedValue *uint64, fieldOffset uint, fieldLength uint, fieldValue uint64) { | ||
*encodedValue |= (fieldValue & (1<<(fieldLength+1) - 1)) << fieldOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is encoded by ig and decoded in include/gadget/usdt_argument.h, which will be compiled into third-party gadgets. So this encoding becomes parts of the ABI between ig and the gadget, and therefore it would need to be documented. I am concerned how we are increasing the API surface, and how difficult it would be to make changes. ig version n+1 would need to support gadgets compiled with include/gadget/usdt_argument.h from previous versions of ig.
I would like bpf extensions (BPF_PROG_TYPE_EXT, Linux >=5.6) to be used as a way to reduce the API surface between ig and gadgets. In this way, the API is reduced to the function usdt_get_argument()
, and the ebpf maps (__usdt_args_buffer and __usdt_args_info) become implementation details of the ebpf extension, invisible to the gadget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the bpf extension needs some special steps before attaching. Trying to diff between https://github.com/cilium/ebpf/blob/main/link/tracing_test.go#L10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
This looks promising, but I only took a slight look and I have plenty of comments.
Also, I would need to take another look, particularly as the golang code.
I would also like to have the commit history written correctly:
- One commit to add the eBPF code related to UDST functions.
- One commit for the golang code.
- Another to add the gadget itself.
More commits would, of course, be welcomed as it eases the review.
Best regards.
@@ -0,0 +1,64 @@ | |||
name: trace gc | |||
description: use uprobe to trace garbage collection in Java and Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am divided on this point.
On one hand, I agree with Alban that I do not really see a use case where people would like to trace both at the same time.
On the other hand, having one tool to do everything seems also easier for user, as they do not have to think about which tool use.
struct { | ||
__uint(type, BPF_MAP_TYPE_HASH); | ||
__uint(key_size, sizeof(u64)); | ||
__uint(value_size, sizeof(struct __usdt_arguments)); | ||
__uint(max_entries, 10000); | ||
} __usdt_args_info SEC(".maps"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, this map is storing the ABI, why do you need 10000 entries then? One would do the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map was designed to be global, and would be shared across all instances. The key
was an unique ID for each attachment.
But now, with the bpf extension support, I'm able to rewrite constants for each extension, so this map will be no longer needed. Will remove it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've realized that I'm using AttachUprobeMulti
to attach the same program to multiple USDT tracepoints.
If I use rewriteConstant
to write data directly into the extension program, I would need to load multiple programs and perform freplace
individually for each, which would put a significant burden on the verifier. Therefore, I will do freplace
only once and use AttachUprobeMulti
, with the data required to read the USDT parameters still stored in this map.
Sorry, I just pushed a nightly commit for backup, not expecting that the github inviting reviewers for this. I'm marking this as draft until I finally finish it. |
f67120c
to
62d9940
Compare
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I have some comments, I will take a deeper look and test it later.
Best regards.
{ | ||
u64 cookie = bpf_get_attach_cookie(ctx); | ||
if (!cookie) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has u64
as return type but you return a bool
here.
Also the comment should be updated.
registerEncodingX86 = map[string]uint64{ | ||
"zero": 0, | ||
"r15": 1, | ||
"r14": 2, | ||
"r13": 3, | ||
"r12": 4, | ||
"rbp": 5, | ||
"rbx": 6, | ||
"r11": 7, | ||
"r10": 8, | ||
"r9": 9, | ||
"r8": 10, | ||
"rax": 11, | ||
"rcx": 12, | ||
"rdx": 13, | ||
"rsi": 14, | ||
"rdi": 15, | ||
"orig_ax": 16, | ||
"rip": 17, | ||
"cs": 18, | ||
"flags": 19, | ||
"rsp": 20, | ||
"ss": 21, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have this defined in a arch specific file.
switch argLength { | ||
case 1: | ||
argLength = 0 | ||
case -1: | ||
argLength = 1 | ||
case 2: | ||
argLength = 2 | ||
case -2: | ||
argLength = 3 | ||
case 4: | ||
argLength = 4 | ||
case -4: | ||
argLength = 5 | ||
case 8: | ||
argLength = 6 | ||
case -8: | ||
argLength = 7 | ||
default: | ||
return 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this code.
case "arm64": | ||
registerEncoding = registerEncodingArm64 | ||
default: | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return an error and not 0.
|
||
parts := strings.Split(arg, "@") | ||
if len(parts) != 2 { | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning 0, you should definitely return an error.
if usdtArgsBufferMap != nil { | ||
return usdtArgsBufferMap, nil | ||
} | ||
|
||
usdtArgsBufferMapSpec := ebpf.MapSpec{ | ||
Name: UsdtArgsBufferMapName, | ||
Type: ebpf.PerCPUArray, | ||
KeySize: 4, | ||
ValueSize: 8, | ||
MaxEntries: 1, | ||
} | ||
var err error | ||
usdtArgsBufferMap, err = ebpf.NewMap(&usdtArgsBufferMapSpec) | ||
if err != nil { | ||
return nil, fmt.Errorf("creating USDT arguments buffer map: %w", err) | ||
} | ||
return usdtArgsBufferMap, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should not have a sync.Once()
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned sync.Once()
, but it accepts a lambda so it's hard to get the err message if the map fails to be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could declare the err
variable outside of the lambda to capture it.
Example of code doing that:
inspektor-gadget/pkg/utils/host/namespaces.go
Lines 33 to 35 in 36f4576
onceHostPidNs sync.Once | |
isHostPidNs bool | |
errHostPidNs error |
Are you suggesting to use a single maps for the gadget, or for all gadgets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to use a single maps for the gadget, or for all gadgets?
I'm using one map for all the gadgets. The map should be loaded with the ebpf extension, not the gadget. If we use individual maps for each gadget, we will need to load the extension for N times, and it will be much slower.
@@ -159,6 +164,16 @@ func (t *Tracer[Event]) AttachProg(progName string, progType ProgType, attachTo | |||
t.attachSymbol = parts[1] | |||
t.prog = prog | |||
|
|||
// for USDT programs using `usdt_get_argument`, load extension to replace the impl | |||
if t.progType == ProgUSDT && hasUsdtArgsFunction(t.prog) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have a ProgUSDT
but hasUsdtArgsFunction()
returns false?
Also, should we catch the error returned by the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have a ProgUSDT but hasUsdtArgsFunction() returns false?
In that case we don't need to load the extension.
Also, should we catch the error returned by the function?
The hasUsdtArgsFunction
function does not return an error. If an error happens, we just will not load the extension.
return 0; | ||
|
||
// USDT prototype: | ||
// gc__start (int generation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this link as a comment in the eBPF code?
// mem__pool__gc__begin (char* manager_name, int manager_name_len, | ||
// char* poll_name, int poll_name_len, | ||
// long initial_size, long memory_in_use, | ||
// long committed_pages, long maximum_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
|
||
// For USDT programs, we need to add the BPF_TRACE_UPROBE_MULTI flag | ||
for _, p := range spec.Programs { | ||
if p.Type == ebpf.Kprobe && strings.HasPrefix(p.SectionName, usdtPrefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand how it could be possible to have a kprobe
with the USDT prefix, can you please shed some light?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both u(ret)probe(multi)s and k(ret)probe(multi)s are classified to ebpf.Kprobe
type [2]. The condition is actually copied from here [1].
[1] https://github.com/inspektor-gadget/inspektor-gadget/blob/main/pkg/operators/ebpf/attach.go#L42
[2] https://ebpf-go.dev/concepts/section-naming/#program-sections
@@ -178,6 +181,14 @@ func (i *ebpfInstance) loadSpec() error { | |||
if err != nil { | |||
return fmt.Errorf("loading spec: %w", err) | |||
} | |||
|
|||
// For USDT programs, we need to add the BPF_TRACE_UPROBE_MULTI flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cilium/ebpf automatically adds the MULTI flag when the section is named "uprobe.multi":
https://github.com/cilium/ebpf/blob/v0.15.0/elf_sections.go#L21-L22
Unfortunately, it does not have a similar section name for usdt. Do you think that should be something to improve in cilium/ebpf and libbpf? In this way, you would not need to manually change the AttachType and you would not need to redefine the constant BPF_TRACE_UPROBE_MULTI.
Meanwhile, this is fine for this PR, since there is no other choices with the current cilium/ebpf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is cilium/ebpf's issue. Here I add the BPF_TRACE_UPROBE_MULTI
flag, only because I'm using a trick to deal with USDT tracepoints which are with the same name.
For example, there are many trace_gc
USDT probes in Libjvm, and I need to attach to all of them. Using BPF_TRACE_UPROBE_MULTI
will make it faster. Also, I don't want the uprobetracer
be aware of this and support nested Links. Using BPF_TRACE_UPROBE_MULTI
will merge many ebpf.Link
s into one, and I think that's more elegant.
@@ -56,6 +56,8 @@ const ( | |||
ParamTraceKernel = "trace-pipe" | |||
|
|||
kernelTypesVar = "kernelTypes" | |||
|
|||
BPF_TRACE_UPROBE_MULTI ebpf.AttachType = 48 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining where this value comes from?
The constant is declared in cilium/ebpf but it is internal:
https://github.com/cilium/ebpf/blob/v0.15.0/internal/sys/types.go#L67
It is part of the Linux API:
https://github.com/torvalds/linux/blob/v6.8/include/uapi/linux/bpf.h#L1049
pkg/uprobetracer: support reading USDT arguments
How to use
The
usdt_get_argument
function will return a __u64* pointer, it will be NULL on failure. Users can derefence it to get the value.Also, the
trace_gc
gadget is added as a demo of usingusdt_get_argument
.