Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase visibility of scaling decision when in dry-run mode #624

Open
lgfa29 opened this issue Mar 10, 2023 · 2 comments
Open

Increase visibility of scaling decision when in dry-run mode #624

lgfa29 opened this issue Mar 10, 2023 · 2 comments
Labels
stage/accepted theme/policy-eval Policy broker, workers and evaluation type/enhancement

Comments

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 10, 2023

Enabling dry-run is a good way to preview how a policy behaves providing operators the chance to adjust them before going live.

But currently the Autoscaler logs most of its internal operations at the TRACE level, which hides some of the information you would expect out of the dry-run mode.

Nomad has the nomad monitor command that allows operators to subscribe and upgrade or downgrade log level dynamically. This requires exposing logs via an API endpoint, which is currently unprotected for the Autoscaler.

Another option could be implementing a custom hclog.Logger that wraps a regular logger but upgrades messages when dry-run is enabled. This avoids having to sprinkle if clauses everywhere. The logger can be passed down to different components transparently.

A third option could be integrating OpenTelemetry so the Autoscaler actions are exposed as traces and spans instead of text-based log messages.

@erulabs
Copy link

erulabs commented Mar 14, 2023

+1 for OpenTelemetry - that would be amazing! A metric per group with the new resultant count would be great!

If this issue sits long enough for me to find some free time, I may try to implement this myself! ❤️

@lgfa29
Copy link
Contributor Author

lgfa29 commented Mar 17, 2023

Nice! I have been fiddling with OpenTelemetry for a while but have not been able to put much in practice yet. Starting with Autoscaler could be a useful learning experiment 🙂

These are some of my previous explorations:
https://github.com/hashicorp/nomad/compare/luiz-wip-otel
hashicorp/nomad@main...task-otel-resource-attrs-env-var
https://github.com/hashicorp/nomad/compare/wip-otel

This is probably the main function we want to add more telemetry:
https://github.com/hashicorp/nomad-autoscaler/blob/main/policyeval/base_worker.go#L93-L94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted theme/policy-eval Policy broker, workers and evaluation type/enhancement
Projects
None yet
Development

No branches or pull requests

2 participants