Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Request: Dashboard in Datadog #850

Closed
xpdable opened this issue Mar 25, 2024 · 4 comments
Closed

User Request: Dashboard in Datadog #850

xpdable opened this issue Mar 25, 2024 · 4 comments

Comments

@xpdable
Copy link

xpdable commented Mar 25, 2024

Does someone have practice to create dashboard according to chaos.* metrics with application metrics in datadog dashboard?
So that SRE can easily monitoring/compare the chaos injection with steady states?
Thanks in advacne.

@ptnapoleon
Copy link
Contributor

Hi. Yes, we can probably share the queries we're using for specific widgets. Is there anything in particular you'd like to visualize that you're having trouble with?

@xpdable
Copy link
Author

xpdable commented Mar 26, 2024

Hi. Yes, we can probably share the queries we're using for specific widgets. Is there anything in particular you'd like to visualize that you're having trouble with?

Hi Philip, my idea is quite simple now for pilot showcase.

  1. I make a widget showing the application status, saying the http response code distribution either 2xx or >4xx, in bar chart over timeseries.

  2. I want to make another widget where metrics from chaos controller over timeseries that show when my DisruptionCron/Disruption are injected.
    Then I put these two in one dashboard, so it would be a clear view of steady state vs. turbulence.

Thanks,
Xiaopeng

@xpdable
Copy link
Author

xpdable commented May 13, 2024

@ptnapoleon Do you have some good idea of it? Thanks

@ptnapoleon
Copy link
Contributor

Hi, so sorry about the delay, I forgot to get back to you.

I can't help with the first point, it's outside the scope of the project, and I'm not an expert on the best practices. For the latter,
we have the chaos.controller.validation.created metric, which you can filter by namespace and target to see when disruptions are created.
chaos.controller.disruptions.gauge with similar filtering can you show an ongoing count of disruptions
chaos.controller.pods.gauge will show you the live injector pods for any given disruption

These will all work for disruptions created directly or via disruptionCron.

The full list of metrics you can use are here: https://github.com/DataDog/chaos-controller/blob/main/docs/metrics_events.md

For specific help with the datadog dashboard product, you can check out the datadog's docs https://docs.datadoghq.com/ , the public slack at https://chat.datadoghq.com/ , or contact support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants