You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Interval probes are attached to PERF_COUNT_SW_CPU_CLOCK events that fire on CPU 0. If that CPU goes offline once an interval probe is attached, that probe never runs, even if the CPU comes back up later:
bpftrace -e 'interval:s:3 { exit(); }' &
echo 0 > /sys/devices/system/cpu/cpu0/online
echo 1 > /sys/devices/system/cpu/cpu0/online
# exit() doesn't get called
(If CPU0 is already offline, perf_event_open() in attach_interval() returns ENODEV, and bpftrace reports an error. This case can be handled by using the first online CPU instead of hardcoding CPU0.)
This issue was observed on an Android device where CPU hotplug can be used to save power.
From my understanding of the kernel code here, once the CPU goes offline (perf_event_exit_cpu), associated perf events transition into PERF_EVENT_STATE_OFF and stay there until they're explicitly re-enabled (e.g. via PERF_EVENT_IOC_ENABLE). Bpftrace doesn't consume the SW_CPU_CLOCK event directly, so it doesn't detect when it becomes disabled.
Here are some ideas on how bpftrace might be able to handle this edge case. Unfortunately none of them seems particularly elegant nor trivial to implement.
Attach the interval probe to a perf event for each online CPU (just like for profile probes), but emit extra BPF code that ensures it's called only once (per iteration). This could presumably be implemented via a counter and atomic BPF instructions (e.g. BPF_CMPXCHG).
Add perf event fds to the epoll fd set; when an event is disabled, attach the BPF program to a new event on one of the online CPUs. Period may need to be adjusted when reattaching.
Implement interval probes in userspace, e.g. by having bpftrace spawn a new thread that sleeps (in a loop) for the specific duration before calling a trigger function that has a uprobe attached to it (similar to how BEGIN/END probes work).
The text was updated successfully, but these errors were encountered:
It looks like this affects all probe types using perf events (profile, interval, software and hardware).
Ideally, our solution would work for all these probe types and not just interval. We'll want to avoid the situation where a long-running profile gradually loses visibility into more and more CPUs as they go offline and online again.
Tbh the kernel managing this for us seems like the best option to me... Although Peter Zijlstra didn't like the patch as submitted, it doesn't sound like he was completely opposed to the concept of preserving perf events across hotplugs.
Tagging this with "kernel" in case there's anyone who fancies putting in an upstream Linux patch :)
Ideally, our solution would work for all these probe types and not just interval. We'll want to avoid the situation where a long-running profile gradually loses visibility into more and more CPUs as they go offline and online again.
Tbh the kernel managing this for us seems like the best option to me... Although Peter Zijlstra didn't like the patch as submitted, it doesn't sound like he was completely opposed to the concept of preserving perf events across hotplugs.
I believe there are two slightly different issues, and the proposed kernel patch only addresses the first:
Perf events remain in the off state when CPUs come back online (this affects all probe types using perf events). Ideally this would be handled by the kernel (as in the linked patch), but a not-so-great alternative is bpftrace internally monitoring hotplug events.
Probes that are pinned to a specific CPU (namely, interval probes) stop firing when that CPU is offline.
It doesn't matter which CPU an interval probe runs on (as long as it fires only once per period), so attaching it to all CPUs may still be required even with the kernel patch.
Interval probes are attached to
PERF_COUNT_SW_CPU_CLOCK
events that fire on CPU 0. If that CPU goes offline once an interval probe is attached, that probe never runs, even if the CPU comes back up later:(If CPU0 is already offline,
perf_event_open()
inattach_interval()
returnsENODEV
, and bpftrace reports an error. This case can be handled by using the first online CPU instead of hardcoding CPU0.)This issue was observed on an Android device where CPU hotplug can be used to save power.
From my understanding of the kernel code here, once the CPU goes offline (
perf_event_exit_cpu
), associated perf events transition intoPERF_EVENT_STATE_OFF
and stay there until they're explicitly re-enabled (e.g. viaPERF_EVENT_IOC_ENABLE
). Bpftrace doesn't consume theSW_CPU_CLOCK
event directly, so it doesn't detect when it becomes disabled.Looks like there was a proposal to handle this automatically in the kernel, but the patch was rejected: https://lore.kernel.org/all/aee872be-c34d-729b-575d-df488a46f2cf@codeaurora.org/T/
Here are some ideas on how bpftrace might be able to handle this edge case. Unfortunately none of them seems particularly elegant nor trivial to implement.
profile
probes), but emit extra BPF code that ensures it's called only once (per iteration). This could presumably be implemented via a counter and atomic BPF instructions (e.g.BPF_CMPXCHG
).The text was updated successfully, but these errors were encountered: