Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Support Snapshot] added logging for bad payloads #6534

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nayeem-kamal
Copy link
Contributor

What Does This Do

Motivation

Additional Notes

Jira ticket: [PROJ-IDENT]

@pr-commenter
Copy link

pr-commenter bot commented Jan 22, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master nayeem-kamal/log-bad-payloads
git_commit_date 1705942140 1705957867
git_commit_sha 9430bbc 260332d
release_version 1.29.0-SNAPSHOT~9430bbc87c 1.28.0-SNAPSHOT~260332d3ca
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1705960646 1705960646
ci_job_id 415246196 415246196
ci_pipeline_id 26951026 26951026
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 10 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:insecure-bank:iast_TELEMETRY_OFF:Remote Config better
[-86.402µs; -28.046µs] or [-13.624%; -4.422%]
576.954µs 634.179µs
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.28.0-SNAPSHOT~260332d3ca, baseline=1.29.0-SNAPSHOT~9430bbc87c

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1052962
Total [baseline] (9.382 s) : 0, 9382348
Agent [candidate] (1.055 s) : 0, 1054636
Total [candidate] (9.36 s) : 0, 9359648
section appsec
Agent [baseline] (1.155 s) : 0, 1155486
Total [baseline] (9.478 s) : 0, 9477912
Agent [candidate] (1.153 s) : 0, 1153298
Total [candidate] (9.491 s) : 0, 9490523
section iast
Agent [baseline] (1.174 s) : 0, 1173988
Total [baseline] (9.622 s) : 0, 9621505
Agent [candidate] (1.178 s) : 0, 1177701
Total [candidate] (9.705 s) : 0, 9704718
section profiling
Agent [baseline] (1.283 s) : 0, 1283418
Total [baseline] (9.579 s) : 0, 9578523
Agent [candidate] (1.279 s) : 0, 1279432
Total [candidate] (9.572 s) : 0, 9571979
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.053 s -
Agent appsec 1.155 s 102.523 ms (9.7%)
Agent iast 1.174 s 121.026 ms (11.5%)
Agent profiling 1.283 s 230.456 ms (21.9%)
Total tracing 9.382 s -
Total appsec 9.478 s 95.564 ms (1.0%)
Total iast 9.622 s 239.157 ms (2.5%)
Total profiling 9.579 s 196.174 ms (2.1%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.055 s -
Agent appsec 1.153 s 98.662 ms (9.4%)
Agent iast 1.178 s 123.065 ms (11.7%)
Agent profiling 1.279 s 224.796 ms (21.3%)
Total tracing 9.36 s -
Total appsec 9.491 s 130.875 ms (1.4%)
Total iast 9.705 s 345.07 ms (3.7%)
Total profiling 9.572 s 212.331 ms (2.3%)
gantt
    title petclinic - break down per module: candidate=1.28.0-SNAPSHOT~260332d3ca, baseline=1.29.0-SNAPSHOT~9430bbc87c

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (664.83 ms) : 0, 664830
BytebuddyAgent [candidate] (665.594 ms) : 0, 665594
GlobalTracer [baseline] (295.541 ms) : 0, 295541
GlobalTracer [candidate] (296.343 ms) : 0, 296343
AppSec [baseline] (50.465 ms) : 0, 50465
AppSec [candidate] (50.501 ms) : 0, 50501
Remote Config [baseline] (653.469 µs) : 0, 653
Remote Config [candidate] (650.503 µs) : 0, 651
Telemetry [baseline] (7.228 ms) : 0, 7228
Telemetry [candidate] (7.247 ms) : 0, 7247
section appsec
BytebuddyAgent [baseline] (668.634 ms) : 0, 668634
BytebuddyAgent [candidate] (666.305 ms) : 0, 666305
GlobalTracer [baseline] (296.386 ms) : 0, 296386
GlobalTracer [candidate] (296.423 ms) : 0, 296423
AppSec [baseline] (148.424 ms) : 0, 148424
AppSec [candidate] (148.795 ms) : 0, 148795
Remote Config [baseline] (650.167 µs) : 0, 650
Remote Config [candidate] (645.441 µs) : 0, 645
Telemetry [baseline] (6.899 ms) : 0, 6899
Telemetry [candidate] (6.886 ms) : 0, 6886
section iast
BytebuddyAgent [baseline] (773.943 ms) : 0, 773943
BytebuddyAgent [candidate] (775.114 ms) : 0, 775114
GlobalTracer [baseline] (285.469 ms) : 0, 285469
GlobalTracer [candidate] (287.529 ms) : 0, 287529
AppSec [baseline] (52.134 ms) : 0, 52134
AppSec [candidate] (53.75 ms) : 0, 53750
Remote Config [baseline] (612.505 µs) : 0, 613
Remote Config [candidate] (591.061 µs) : 0, 591
Telemetry [baseline] (6.539 ms) : 0, 6539
Telemetry [candidate] (6.591 ms) : 0, 6591
IAST [baseline] (20.996 ms) : 0, 20996
IAST [candidate] (19.768 ms) : 0, 19768
section profiling
ProfilingAgent [baseline] (124.731 ms) : 0, 124731
ProfilingAgent [candidate] (123.897 ms) : 0, 123897
BytebuddyAgent [baseline] (666.53 ms) : 0, 666530
BytebuddyAgent [candidate] (664.352 ms) : 0, 664352
GlobalTracer [baseline] (378.172 ms) : 0, 378172
GlobalTracer [candidate] (377.642 ms) : 0, 377642
AppSec [baseline] (51.139 ms) : 0, 51139
AppSec [candidate] (50.986 ms) : 0, 50986
Remote Config [baseline] (985.071 µs) : 0, 985
Remote Config [candidate] (977.475 µs) : 0, 977
Telemetry [baseline] (7.31 ms) : 0, 7310
Telemetry [candidate] (7.207 ms) : 0, 7207
Profiling [baseline] (124.756 ms) : 0, 124756
Profiling [candidate] (123.924 ms) : 0, 123924
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.28.0-SNAPSHOT~260332d3ca, baseline=1.29.0-SNAPSHOT~9430bbc87c

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.061 s) : 0, 1061346
Total [baseline] (8.765 s) : 0, 8764870
Agent [candidate] (1.056 s) : 0, 1055824
Total [candidate] (8.74 s) : 0, 8740000
section iast
Agent [baseline] (1.172 s) : 0, 1172371
Total [baseline] (9.26 s) : 0, 9260066
Agent [candidate] (1.188 s) : 0, 1187972
Total [candidate] (9.361 s) : 0, 9360635
section iast_TELEMETRY_OFF
Agent [baseline] (1.174 s) : 0, 1174351
Total [baseline] (9.288 s) : 0, 9288346
Agent [candidate] (1.168 s) : 0, 1167859
Total [candidate] (9.274 s) : 0, 9273649
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.061 s -
Agent iast 1.172 s 111.025 ms (10.5%)
Agent iast_TELEMETRY_OFF 1.174 s 113.006 ms (10.6%)
Total tracing 8.765 s -
Total iast 9.26 s 495.196 ms (5.6%)
Total iast_TELEMETRY_OFF 9.288 s 523.476 ms (6.0%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.056 s -
Agent iast 1.188 s 132.148 ms (12.5%)
Agent iast_TELEMETRY_OFF 1.168 s 112.035 ms (10.6%)
Total tracing 8.74 s -
Total iast 9.361 s 620.636 ms (7.1%)
Total iast_TELEMETRY_OFF 9.274 s 533.649 ms (6.1%)
gantt
    title insecure-bank - break down per module: candidate=1.28.0-SNAPSHOT~260332d3ca, baseline=1.29.0-SNAPSHOT~9430bbc87c

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (670.19 ms) : 0, 670190
BytebuddyAgent [candidate] (666.609 ms) : 0, 666609
GlobalTracer [baseline] (297.611 ms) : 0, 297611
GlobalTracer [candidate] (296.378 ms) : 0, 296378
AppSec [baseline] (50.952 ms) : 0, 50952
AppSec [candidate] (50.534 ms) : 0, 50534
Remote Config [baseline] (686.606 µs) : 0, 687
Remote Config [candidate] (658.572 µs) : 0, 659
Telemetry [baseline] (7.37 ms) : 0, 7370
Telemetry [candidate] (7.246 ms) : 0, 7246
section iast
BytebuddyAgent [baseline] (772.856 ms) : 0, 772856
BytebuddyAgent [candidate] (782.152 ms) : 0, 782152
GlobalTracer [baseline] (285.232 ms) : 0, 285232
GlobalTracer [candidate] (289.632 ms) : 0, 289632
AppSec [baseline] (53.476 ms) : 0, 53476
AppSec [candidate] (55.134 ms) : 0, 55134
Remote Config [baseline] (599.357 µs) : 0, 599
Remote Config [candidate] (604.178 µs) : 0, 604
Telemetry [baseline] (7.363 ms) : 0, 7363
Telemetry [candidate] (6.682 ms) : 0, 6682
IAST [baseline] (18.507 ms) : 0, 18507
IAST [candidate] (19.163 ms) : 0, 19163
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (771.811 ms) : 0, 771811
BytebuddyAgent [candidate] (766.873 ms) : 0, 766873
GlobalTracer [baseline] (287.234 ms) : 0, 287234
GlobalTracer [candidate] (286.873 ms) : 0, 286873
AppSec [baseline] (52.962 ms) : 0, 52962
AppSec [candidate] (53.575 ms) : 0, 53575
Remote Config [baseline] (634.179 µs) : 0, 634
Remote Config [candidate] (576.954 µs) : 0, 577
Telemetry [baseline] (7.214 ms) : 0, 7214
Telemetry [candidate] (6.416 ms) : 0, 6416
IAST [baseline] (20.023 ms) : 0, 20023
IAST [candidate] (19.298 ms) : 0, 19298

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-01-22T21:36:35 2024-01-22T21:53:09
git_branch master nayeem-kamal/log-bad-payloads
git_commit_date 1705942140 1705957867
git_commit_sha 9430bbc 260332d
release_version 1.29.0-SNAPSHOT~9430bbc87c 1.28.0-SNAPSHOT~260332d3ca
start_time 2024-01-22T21:36:22 2024-01-22T21:52:56
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1705960646 1705960646
ci_job_id 415246196 415246196
ci_pipeline_id 26951026 26951026
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 12 unstable metrics.

Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.28.0-SNAPSHOT~260332d3ca, baseline=1.29.0-SNAPSHOT~9430bbc87c
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.353 ms) : 1334, 1373
.   : milestone, 1353,
appsec (1.784 ms) : 1758, 1809
.   : milestone, 1784,
iast (1.541 ms) : 1517, 1566
.   : milestone, 1541,
profiling (1.548 ms) : 1521, 1575
.   : milestone, 1548,
tracing (1.489 ms) : 1464, 1514
.   : milestone, 1489,
section candidate
no_agent (1.36 ms) : 1341, 1380
.   : milestone, 1360,
appsec (1.802 ms) : 1777, 1827
.   : milestone, 1802,
iast (1.537 ms) : 1513, 1562
.   : milestone, 1537,
profiling (1.606 ms) : 1579, 1632
.   : milestone, 1606,
tracing (1.509 ms) : 1483, 1535
.   : milestone, 1509,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.353 ms [1.334 ms, 1.373 ms] -
appsec 1.784 ms [1.758 ms, 1.809 ms] 430.217 µs (31.8%)
iast 1.541 ms [1.517 ms, 1.566 ms] 187.764 µs (13.9%)
profiling 1.548 ms [1.521 ms, 1.575 ms] 194.372 µs (14.4%)
tracing 1.489 ms [1.464 ms, 1.514 ms] 135.639 µs (10.0%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.36 ms [1.341 ms, 1.38 ms] -
appsec 1.802 ms [1.777 ms, 1.827 ms] 441.492 µs (32.5%)
iast 1.537 ms [1.513 ms, 1.562 ms] 176.993 µs (13.0%)
profiling 1.606 ms [1.579 ms, 1.632 ms] 245.315 µs (18.0%)
tracing 1.509 ms [1.483 ms, 1.535 ms] 148.639 µs (10.9%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.28.0-SNAPSHOT~260332d3ca, baseline=1.29.0-SNAPSHOT~9430bbc87c
    dateFormat X
    axisFormat %s
section baseline
no_agent (367.337 µs) : 347, 388
.   : milestone, 367,
iast (474.847 µs) : 454, 495
.   : milestone, 475,
iast_FULL (541.685 µs) : 521, 562
.   : milestone, 542,
iast_INACTIVE (448.284 µs) : 427, 469
.   : milestone, 448,
iast_TELEMETRY_OFF (475.66 µs) : 455, 496
.   : milestone, 476,
tracing (452.556 µs) : 431, 474
.   : milestone, 453,
section candidate
no_agent (374.425 µs) : 354, 395
.   : milestone, 374,
iast (481.743 µs) : 461, 502
.   : milestone, 482,
iast_FULL (548.849 µs) : 528, 569
.   : milestone, 549,
iast_INACTIVE (453.084 µs) : 432, 474
.   : milestone, 453,
iast_TELEMETRY_OFF (477.596 µs) : 457, 498
.   : milestone, 478,
tracing (455.924 µs) : 436, 476
.   : milestone, 456,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 367.337 µs [347.039 µs, 387.634 µs] -
iast 474.847 µs [454.408 µs, 495.286 µs] 107.51 µs (29.3%)
iast_FULL 541.685 µs [521.214 µs, 562.156 µs] 174.348 µs (47.5%)
iast_INACTIVE 448.284 µs [427.455 µs, 469.113 µs] 80.947 µs (22.0%)
iast_TELEMETRY_OFF 475.66 µs [455.175 µs, 496.144 µs] 108.323 µs (29.5%)
tracing 452.556 µs [431.49 µs, 473.622 µs] 85.219 µs (23.2%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 374.425 µs [353.783 µs, 395.066 µs] -
iast 481.743 µs [461.332 µs, 502.153 µs] 107.318 µs (28.7%)
iast_FULL 548.849 µs [528.336 µs, 569.362 µs] 174.424 µs (46.6%)
iast_INACTIVE 453.084 µs [432.426 µs, 473.742 µs] 78.659 µs (21.0%)
iast_TELEMETRY_OFF 477.596 µs [457.018 µs, 498.173 µs] 103.171 µs (27.6%)
tracing 455.924 µs [435.696 µs, 476.151 µs] 81.499 µs (21.8%)

@@ -46,6 +47,8 @@ public class DDAgentApi extends RemoteApi {

private final Recording sendPayloadTimer;
private final Counter agentErrorCounter;
private int failureCount = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should determine a better way to limit the number of payloads we collect to ensure logfiles don't get too large. Maybe sample 25% of the failed payloads into the logs?

@@ -46,6 +47,8 @@ public class DDAgentApi extends RemoteApi {

private final Recording sendPayloadTimer;
private final Counter agentErrorCounter;
private int failureCount = 0;
private okio.Buffer failureBuffer = new okio.Buffer();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only create the buffer if the config for logging failed payloads is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final? Maybe not, if you are making this lazy.

@@ -110,12 +114,22 @@ public Response sendSerializedTraces(final Payload payload) {
metricsEnabled && featuresDiscovery.supportsMetrics() ? "true" : "")
.put(payload.toRequest())
.build();
if (failureCount < 10) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. this may not be representative of all of the payloads they are sending so maybe we should reevaluate this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this behind the config.

this.totalTraces += payload.traceCount();
this.receivedTraces += payload.traceCount();
try (final Recording recording = sendPayloadTimer.start();
final okhttp3.Response response = httpClient.newCall(request).execute()) {
handleAgentChange(response.header(DATADOG_AGENT_STATE));
if (response.code() != 200) {
if (failureCount < 10) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this behind config

this.totalTraces += payload.traceCount();
this.receivedTraces += payload.traceCount();
try (final Recording recording = sendPayloadTimer.start();
final okhttp3.Response response = httpClient.newCall(request).execute()) {
handleAgentChange(response.header(DATADOG_AGENT_STATE));
if (response.code() != 200) {
if (failureCount < 10) {
log.debug(
"isTraceAgentV05Enabled:{} \nFailed Payload: {}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will currently log the entire payload as hex bytes. Probably sufficient for now but not very user friendly(for support)

@@ -254,8 +254,9 @@ public void writeTo(WritableByteChannel channel) throws IOException {
while (header.hasRemaining()) {
channel.write(header);
}
while (body.hasRemaining()) {
channel.write(body);
ByteBuffer copyBody = body.duplicate();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to duplicate the buffer it should be behind the config to avoid the increased memory overhead when called.


StringBuffer hex = new StringBuffer();
for (byte b : bytes) {
hex.append(String.format("%02x", b));
Copy link
Contributor

@dougqh dougqh Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify the locale here. I learned the hard way that "x" format behaves differently in other locales.

public String okioBufferToHex(okio.Buffer buffer) {
byte[] bytes = buffer.readByteArray();

StringBuffer hex = new StringBuffer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringBuffer -> StringBuilder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants