Increase visibility of scaling decision when in dry-run mode #624

lgfa29 · 2023-03-10T20:36:36Z

Enabling dry-run is a good way to preview how a policy behaves providing operators the chance to adjust them before going live.

But currently the Autoscaler logs most of its internal operations at the TRACE level, which hides some of the information you would expect out of the dry-run mode.

Nomad has the nomad monitor command that allows operators to subscribe and upgrade or downgrade log level dynamically. This requires exposing logs via an API endpoint, which is currently unprotected for the Autoscaler.

Another option could be implementing a custom hclog.Logger that wraps a regular logger but upgrades messages when dry-run is enabled. This avoids having to sprinkle if clauses everywhere. The logger can be passed down to different components transparently.

A third option could be integrating OpenTelemetry so the Autoscaler actions are exposed as traces and spans instead of text-based log messages.

The text was updated successfully, but these errors were encountered:

erulabs · 2023-03-14T16:15:24Z

+1 for OpenTelemetry - that would be amazing! A metric per group with the new resultant count would be great!

If this issue sits long enough for me to find some free time, I may try to implement this myself! ❤️

lgfa29 · 2023-03-17T23:34:43Z

Nice! I have been fiddling with OpenTelemetry for a while but have not been able to put much in practice yet. Starting with Autoscaler could be a useful learning experiment 🙂

These are some of my previous explorations:
https://github.com/hashicorp/nomad/compare/luiz-wip-otel
hashicorp/nomad@main...task-otel-resource-attrs-env-var
https://github.com/hashicorp/nomad/compare/wip-otel

This is probably the main function we want to add more telemetry:
https://github.com/hashicorp/nomad-autoscaler/blob/main/policyeval/base_worker.go#L93-L94

lgfa29 added stage/accepted type/enhancement theme/policy-eval Policy broker, workers and evaluation labels Mar 10, 2023

lgfa29 mentioned this issue Mar 10, 2023

log something useful when dry-running cluster scaling policies #621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase visibility of scaling decision when in dry-run mode #624

Increase visibility of scaling decision when in dry-run mode #624

lgfa29 commented Mar 10, 2023 •

edited

erulabs commented Mar 14, 2023

lgfa29 commented Mar 17, 2023

Increase visibility of scaling decision when in dry-run mode #624

Increase visibility of scaling decision when in dry-run mode #624

Comments

lgfa29 commented Mar 10, 2023 • edited

erulabs commented Mar 14, 2023

lgfa29 commented Mar 17, 2023

lgfa29 commented Mar 10, 2023 •

edited