How can one add a weekly maintenance window into the calculations for SLO's with sloth? #529

golodhrim · 2023-12-13T18:00:05Z

I just found out about the sloth project today and after a lot of reading of the docs I think it is totally what I look for. but the only question that still is up in my head is how I can add a weekly returning maintenance window to the SLO calculations, cause an outage in this time window it would be not counted against the SLO at all.
Greetings

tokheim · 2024-01-24T00:40:02Z

At least you first need prometheus to record maintenance windows. Either some system that reports this as metric, or if its a fixed time, you could build a recording rule with the day_of_week and hour functions.

Then I'd probably cut my losses and just define a inhibit rule to avoid sending alerts during maintenance period (maybe add a buffer around the period). The slo calculations and boards would still take errors during maintenance period into account though. I would find that advantageous though as I'd anyways encourage trying to limit impact of maintenance periods.

Still if you really need calculations to exclude maintenance periods, then the approach would likely depend on your query. Assuming maintenance_period recording rule that reports 1 during mainteance, 0 otherwise, then maybe queries like this would do the trick

  error_query: |
    sum_over_time((
        sum(rate(<error_counter>[30s]))
        * scalar(1-maintenance_period)
    )[{{.window}}:])
  total_query: |
    (sum_over_time((
        sum(rate(<total_counter>[30s]))
        * scalar(1-maintenance_period)
    )[{{.window}}:]) > 0) or vector(1)

Basically if you take rate over full window period, you wouldn't know which errors happened during maintenance period. sum_over_time should still ensure the error ratio is a quite good approximation for the entire window period. > 0 or vector(1) will be quite important to include as the error ratios would otherwise have 0 denominator inside any maintenance period

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can one add a weekly maintenance window into the calculations for SLO's with sloth? #529

How can one add a weekly maintenance window into the calculations for SLO's with sloth? #529

golodhrim commented Dec 13, 2023

tokheim commented Jan 24, 2024

How can one add a weekly maintenance window into the calculations for SLO's with sloth? #529

How can one add a weekly maintenance window into the calculations for SLO's with sloth? #529

Comments

golodhrim commented Dec 13, 2023

tokheim commented Jan 24, 2024