Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Scale up only if two checks are true #784

Open
jaltavilla opened this issue Nov 29, 2023 · 2 comments
Open

Question: Scale up only if two checks are true #784

jaltavilla opened this issue Nov 29, 2023 · 2 comments
Assignees
Labels
stage/waiting-reply theme/policy Policy source, parsing and validation type/question

Comments

@jaltavilla
Copy link

jaltavilla commented Nov 29, 2023

We have some services that consume a limited external resource. There is a metric on how much of that resource is remaining. I would like to be able to write a scaling policy that both scales on service cpu and prevents scaling up beyond the capacity remaining. However, my understanding is that the scale up decision is an or of all checks and uses the max value suggested.

I think the only way to accomplish this goal currently is to set the max count. This is fragile as a change to the external resource needs to also be reflected in the service's max count. It's also more complicated if multiple services all consume the same resource.

Is there some way to accomplish this that I'm missing?

@jaltavilla jaltavilla changed the title Feature Request: Scale up using an and of checks Question: Scale up only if two checks are true Dec 1, 2023
@lgfa29
Copy link
Contributor

lgfa29 commented Dec 22, 2023

Hi @jaltavilla 👋

The max value sounds like what you need 🤔

Since you have a metric for your resource you may be able to adjust the query instead. For example, with Prometheus you could use the clamp_max function to limit the value of another query result:

query = "clam_max(actual_query_you_want_to_scale, current_resource_metric_value)"

But I haven't tested this and I'm not sure if it works for your use case 😅

@lgfa29 lgfa29 added stage/waiting-reply theme/policy Policy source, parsing and validation type/question labels Dec 22, 2023
@lgfa29 lgfa29 self-assigned this Dec 22, 2023
@jaltavilla
Copy link
Author

jaltavilla commented Jan 30, 2024

Thanks, that's a pretty clever idea!

We are using the target strategy, and I sadly I couldn't quite figure out how to make it work with it. I suspect that if we were using the threshold strategy it would be easier. My understanding of the target strategy is that current_resource_metric_value can't return an arbitrary limit. It needs to be in the range [0, target] when the resource is exhausted to allow it to scale down only when actual_query_you_want_to_scale indicates it could. Similarly the range has to be [0, large number>target] when the resource isn't exhausted.

We already clamp_max our query to 200 to limit scaling up on spikes. So we need a function that returns [0, target] or [0, 200] based on whether resources are exhausted. I could transform that into needing an equation that returns 0 or 1 with
query = clamp_max(actual_query_you_want_to_scale, target * one_if_exhausted + 200 * one_if_not_exhausted)

But, I couldn't figure out how to make an equation that returned 0 or 1 with the datadog functions available, so I ended up auditing our services and fixing their max counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/waiting-reply theme/policy Policy source, parsing and validation type/question
Projects
None yet
Development

No branches or pull requests

2 participants