New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CONTINT-4105] Support arbitrary container-ids to collect container metrics #25515
base: main
Are you sure you want to change the base?
Conversation
Regression DetectorRegression Detector ResultsRun ID: 4271b1b2-c1cf-4110-86f5-09a44468e8b3 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | tcp_syslog_to_blackhole | ingress throughput | +7.82 | [-13.76, +29.40] |
➖ | basic_py_check | % cpu utilization | +1.49 | [-0.94, +3.92] |
➖ | file_tree | memory utilization | +0.61 | [+0.52, +0.71] |
➖ | otel_to_otel_logs | ingress throughput | +0.35 | [-0.04, +0.73] |
➖ | idle | memory utilization | +0.03 | [+0.00, +0.07] |
➖ | uds_dogstatsd_to_api | ingress throughput | +0.02 | [-0.19, +0.22] |
➖ | trace_agent_json | ingress throughput | -0.00 | [-0.01, +0.01] |
➖ | trace_agent_msgpack | ingress throughput | -0.01 | [-0.01, -0.00] |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.02 | [-0.05, +0.01] |
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | -1.30 | [-4.14, +1.54] |
➖ | pycheck_1000_100byte_tags | % cpu utilization | -2.82 | [-7.59, +1.94] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
c7c47ae
to
596d7ee
Compare
/trigger-ci --variable RUN_ALL_BUILDS=true --variable RUN_KITCHEN_TESTS=true --variable RUN_E2E_TESTS=on --variable RUN_UNIT_TESTS=on --variable RUN_KMT_TESTS=on |
🚂 Gitlab pipeline started Started pipeline #34122753 |
59bf823
to
e702803
Compare
e702803
to
f9e651b
Compare
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=34930504 --os-family=ubuntu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with a minor suggestion to the release note
releasenotes/notes/support-arbitrary-container-id-cc0efdf7c156b7ad.yaml
Outdated
Show resolved
Hide resolved
…b7ad.yaml Co-authored-by: Bryce Eadie <bryce.eadie@datadoghq.com>
comp/core/workloadmeta/collectors/internal/containerd/container_builder.go
Outdated
Show resolved
Hide resolved
var w workloadmeta.Component | ||
unwrapped, ok := wlm.Get() | ||
if ok { | ||
w = unwrapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not ok, you're passing nil
to newContainerFilter
. Perhaps you should fail and return an error instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it's intentional. The start
function returns nothing and the ContainerFilter
will only call cgroups.ContainerFilter
EventType: workloadmeta.EventTypeAll, | ||
}, | ||
)) | ||
defer cf.wlm.Unsubscribe(evBundle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually never called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the channel could be closed by workloadmeta on shutdown. Removed the Unsubscribe for now.
return res, nil | ||
} | ||
cf.mutex.RLock() | ||
res := cf.trie.Get(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have path matching, is the trie actually useful compared to a map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, ContainerFilter
is called with a full path while the workloadmeta object stores suffixes so we need to do suffix matching. I'll improve the doc and split the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the implementation is racy by nature, as we depend on the subscriber to have done the work when ContainerFilter
is called.
It will normally always converge after few seconds, but it should be noted.
3fcd75e
to
88671fe
Compare
Go Package Import DifferencesBaseline: 5fa6bb3
|
a83b759
to
315b4a0
Compare
…nly populate the Trie if the regex does not match
315b4a0
to
ab12cf8
Compare
556bb1a
to
4defc85
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #25515 +/- ##
===========================================
+ Coverage 45.17% 48.74% +3.57%
===========================================
Files 2314 1760 -554
Lines 266564 165312 -101252
===========================================
- Hits 120430 80589 -39841
+ Misses 136547 79658 -56889
+ Partials 9587 5065 -4522
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
What does this PR do?
This PR adds support of arbitrary container-id for containerd. We now can collect their container metrics and tags.
Motivation
Reduce the number of false negatives with a more robust solution to retrieve container-ids.
Additional Notes
RFC
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
Deploy the agent on a kind cluster.
Pull an image on the node with
docker exec <node-id> ctr i pull docker.io/library/redis:latest
Start a redis container on the node with
docker exec <node-id> ctr run docker.io/library/redis:latest redis
Make sure container metrics can be found with the right container-id (redis).
Notebook.