-
-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pmdabpf sets a too-low limit for rlimit #1915
Comments
For what it's worth, in 5.11+ kernels this is changed to cgroup accounting, so the effect is only apparent on older kernels which also have the pmdabpf built (ie: this might be something only we are seeing on legacy OS distributions). |
@jasonk000 What was the original goal of setting the MEMLOCK limit in pmdabpf? (making sure we can lock down space we need but nothing extremely large, I guess?) Another option might be to look at physical memory size (sysconf(3) _SC_PHYS_PAGES or _SC_AVPHYS_PAGES) and then calculate a value to ensure we set a lock limit that's a fraction of what is available as physical memory on the machine (e.g. no more than a 10th - some heuristic, anyway - perhaps clamped to a minimum value of 100MB?) Another option could be to say we trust ourselves, and anyone configuring pmdabpf modules must already have root access, so remove all limits for the pmdabpf process (RLIM_INFINITY?)? |
Background context -- In the original stages, before the memory using resources were accounted against the cgroup, they were tracked and limited by using memlock. This is relevant for maps, since you can specify arbitrary sized maps which can be very big (we have some maps that are defined as over 100MB). As a common approach, many libbpf users take a setrlimit approach because it was a more clear error scenario than to fail with EPERM (since bpf normally runs as root, a setrlimit to a high enough value is nearly always going to work). With newer kernels (5.11+) this is no longer necessary since the limiting is applied at cgroup rather than at rlimit. I don't think there's a lot of value picking some arbitrary value because it will always be either too-high, or it will be too-low, unless the operator knows the app they are using. Personally I think of a couple of approaches, infinity works. What I am currently doing on private build is using systemd to set |
OK, should we just change bpf_setrlimit to unilaterally set RMLIMIT_MEMLOCK rlim_cur/max to infinity then? That's simpler than the current code in there and means no systemd changes are needed for everyone else...
|
I think that works too, so long as we do not bail on error from |
OK, removing it entirely works for me too - especially if we can make the systemd unit conditional on kernel version, or even put the responsibility of managing that onto the user entirely (maybe we could just document it in the man page?). |
Making the same mistake as
640KB ought to be enough for anybody
, I picked a 100MB rlimit setting duringbpf.c
setup. Now, when another (non-PCP) bpf application is loaded, we have a conflict.Proposal:
Reproduce:
With another program running, consuming ~350MB of maps and progs
In this case, bpftrace also shows that bpf charge is failing:
because 85322 pages (350MB) are already counted against the user, and the current limit is 25600 pages (100MB).
The text was updated successfully, but these errors were encountered: