RFC: Add remediation for BPF ABBA deadlock bug caused by hash map access #3144
Labels
difficulty: medium
enhancement
New feature or request, changes on existing features
not-our-bug
For issues internal to bcc, libbpf, the kernel etc...
There was an issue discovered recently via a bpftrace script whereby a kfunc was attached on the same path used to access BPF_MAP_TYPE_HASH which created an ABBA deadlock in the kernel and a subsequent crash e.g.
@spinlock_start
was being accessed by another bpf prog on a different CPU.Example stack trace:
This issue is being addressed in a bpf kernel patch however because this is going into a later kernel release we should consider adding a temp fix for this in bpftrace.
One possible option, suggested by Alexei, is for a bpftrace script to create a per-cpu variable that is tested when a functional block is called. If it is already set, we exit early (and increment a missed counter?), if it is not yet set, set it, run the functional block, and clear it at exit time. We could do this conditionally if we detect that the prog is accessing a non per-cpu map type.
Another option would be to block use of certain kfuncs/kprobe if bpftrace can detect if these progs are accessing non per-cpu maps in a script with other progs accessing this map (which may end up being more code).
This issue is meant to be a place to discuss possible solutions and the priority of this fix.
The text was updated successfully, but these errors were encountered: