Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add queue percentage to libbeat metrics #39205

Merged
merged 14 commits into from Apr 30, 2024

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented Apr 24, 2024

Proposed commit message

Part of #38708

This adds a queue.full metric that reports the percentage of queue usage in libbeat.

I'm not sure if this is the correct way to measure queue usage in percentage, but this was such a small change I figured it would be faster to just put in a PR and ask, rather than ask and wait.

Testing

You can test this by starting metricbeat with --httpprof localhost:9898 and then checking the metrics with curl localhost:9898/debug/vars | grep libbeat

The metric will also appear in the last 30s metrics.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@fearful-symmetry fearful-symmetry added the Team:Elastic-Agent Label for the Agent team label Apr 24, 2024
@fearful-symmetry fearful-symmetry self-assigned this Apr 24, 2024
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner April 24, 2024 22:31
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Apr 24, 2024
Copy link
Contributor

mergify bot commented Apr 24, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @fearful-symmetry? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 24, 2024

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 113 min 21 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@fearful-symmetry fearful-symmetry marked this pull request as draft April 25, 2024 00:27
Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Now that we can know the queue size in bytes, I'm just wondering if we should make it more clear that the percentage is calculated from the number of events.

@fearful-symmetry fearful-symmetry marked this pull request as ready for review April 25, 2024 16:02
@fearful-symmetry
Copy link
Contributor Author

@cmacknz if I understand #38708, all the rest of the the implementation of this is in integrations/dashboards. Is someone on the integrations side doing that, or are we doing the dashboards too?

@cmacknz
Copy link
Member

cmacknz commented Apr 26, 2024

I would update the integration as part of this work. If you can't for some reason, create a separate issue to do it so it isn't lost.

@cmacknz
Copy link
Member

cmacknz commented Apr 26, 2024

Now that we can know the queue size in bytes, I'm just wondering if we should make it more clear that the percentage is calculated from the number of events.

The queue size limit is still configured in units of events and the metric here should match the units of the maximum size limit.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Apr 28, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@fearful-symmetry
Copy link
Contributor Author

fearful-symmetry commented Apr 29, 2024

Huh, it's possible for activeEvents > max_events:

{"clients":2,"events":{"active":3201,"published":99200,"total":99200},"queue":{"acked":99200,"filled":{"pct":{"events":1.0003125}},"max_events":3200}}

This happens occasionally, in the metrics, and it's always max_queue+1. I assume it's just a result of the queue itself not being completely synced up with the metrics reporter. That, or an off-by-one. @faec you seen this before?

@faec
Copy link
Contributor

faec commented Apr 29, 2024

This happens occasionally, in the metrics, and it's always max_queue+1. I assume it's just a result of the queue itself not being completely synced up with the metrics reporter. That, or an off-by-one. @faec you seen this before?

Yeah, I've seen this, it's just an artifact of the metrics receiver getting notified of acks slightly after the queue unblocks so it sees the new event in flight before decrementing the old ones. I've never seen the pipeline report something that's actually wrong, just slightly behind on acks, though I did see recently that the output subtree sometimes has a number that's just flat-out wrong #39146

@fearful-symmetry
Copy link
Contributor Author

Alright, added rounding. I decided to keep the percent calculations "raw" (0.0-1.0, not 0.0-100.0), since that's what we do for the system metrics percentages, and I figure we might as well keep it consistent?

@fearful-symmetry fearful-symmetry merged commit 562e48e into elastic:main Apr 30, 2024
183 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants