Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CWS] rework mmap hook + rhel 9.3 fix #25518

Merged
merged 6 commits into from May 13, 2024
Merged

Conversation

paulcacheux
Copy link
Contributor

@paulcacheux paulcacheux commented May 10, 2024

What does this PR do?

This PR fixes the mmap hooking on RHEL9.3 again. The issue this PR fixes is that some RHEL 9.3 kernels are broken (bisection still in progress) causing the tracepoint sys_enter_mmap file to not match the reality.
Basically:

$ sudo cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_mmap/format
name: sys_enter_mmap
ID: 101
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;
	field:unsigned char common_preempt_lazy_count;	offset:8;	size:1;	signed:0;

	field:int __syscall_nr;	offset:12;	size:4;	signed:1;
	field:unsigned long addr;	offset:16;	size:8;	signed:0;
	field:unsigned long len;	offset:24;	size:8;	signed:0;
	field:unsigned long prot;	offset:32;	size:8;	signed:0;
	field:unsigned long flags;	offset:40;	size:8;	signed:0;
	field:unsigned long fd;	offset:48;	size:8;	signed:0;
	field:unsigned long off;	offset:56;	size:8;	signed:0;

print fmt: "addr: 0x%08lx, len: 0x%08lx, prot: 0x%08lx, flags: 0x%08lx, fd: 0x%08lx, off: 0x%08lx", ((unsigned long)(REC->addr)), ((unsigned long)(REC->len)), ((unsigned long)(REC->prot)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->fd)), ((unsigned long)(REC->off))

but in reality addr and following field are at offset 24 and up. Fun.
To fix this this PR adds a small manual offset is the problematic kernel is detected. This only impacts mmap detection because it's the only one that is based on a tracepoint hook for starting the event creation (mprotect is based on a kprobe for example).

During this investigation I discovered multiple oddities that this PR fixes as well:

  • correct u64 types for different mmap fields (instead of the types of glibc wrapper)
  • use tracepoint sys_exit_mmap instead of a kretprobe
  • use kprobe on security_mmap_file instead of a kretprobe on fget (which cannot have been free from a perf point of view)

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

@pr-commenter
Copy link

pr-commenter bot commented May 10, 2024

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=34105882 --os-family=ubuntu

@pr-commenter
Copy link

pr-commenter bot commented May 10, 2024

Regression Detector

Regression Detector Results

Run ID: 3fe78f7b-531d-42f1-b372-55f2dcb0fd1f
Baseline: 83418a3
Comparison: fecff09

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI
file_to_blackhole % cpu utilization -15.52 [-19.94, -11.10]

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
file_tree memory utilization +0.15 [+0.03, +0.28]
uds_dogstatsd_to_api ingress throughput +0.02 [-0.19, +0.22]
trace_agent_msgpack ingress throughput +0.00 [-0.00, +0.00]
trace_agent_json ingress throughput -0.00 [-0.01, +0.01]
tcp_dd_logs_filter_exclude ingress throughput -0.01 [-0.05, +0.02]
otel_to_otel_logs ingress throughput -0.09 [-0.46, +0.27]
idle memory utilization -0.18 [-0.21, -0.15]
pycheck_1000_100byte_tags % cpu utilization -0.30 [-5.07, +4.47]
basic_py_check % cpu utilization -0.41 [-3.05, +2.24]
uds_dogstatsd_to_api_cpu % cpu utilization -0.71 [-3.50, +2.08]
tcp_syslog_to_blackhole ingress throughput -8.86 [-28.78, +11.06]
file_to_blackhole % cpu utilization -15.52 [-19.94, -11.10]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@paulcacheux paulcacheux force-pushed the paulcacheux/mmap-rework branch 6 times, most recently from b0b107f to a08a5ce Compare May 12, 2024 10:16
@paulcacheux paulcacheux changed the title [CWS] mmap hooks: use security_mmap_file instead of fget [CWS] rework mmap hook + rhel 9.3 fix May 13, 2024
@paulcacheux paulcacheux marked this pull request as ready for review May 13, 2024 05:58
@paulcacheux paulcacheux requested a review from a team as a code owner May 13, 2024 05:58
@paulcacheux
Copy link
Contributor Author

/merge

@dd-devflow
Copy link

dd-devflow bot commented May 13, 2024

🚂 MergeQueue

Pull request added to the queue.

There are 2 builds ahead! (estimated merge in less than 1h)

Use /merge -c to cancel this operation!

@dd-mergequeue dd-mergequeue bot merged commit 5a48ec3 into main May 13, 2024
214 checks passed
@dd-mergequeue dd-mergequeue bot deleted the paulcacheux/mmap-rework branch May 13, 2024 10:27
@github-actions github-actions bot added this to the 7.55.0 milestone May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants