Allow scale to zero to work when min-scale is greater than 0 #15154

daraghlowe · 2024-04-22T13:35:43Z

What feature do you want?

We want to be able to set min-scale while active to 2 so that our active Knative services are high available and still allow scale to zero so we're not wasting compute resources when our Knative services are not receiving requests

Describe the feature

We're currently running our Knative services without min-scale set and we allow the services to scale down to zero when they're not actively receiving requests. This obviously saves us a lot in ensuring we're not wasting compute resources and is a feature of Knative that we want to continue utilising.

In addition to scale to zero, we also want our Knative services to be highly available when they are receiving requests and are active. Specifically we have a problem when our node pool upgrades happen that any of the services running a single replica will experience downtime while the pods are evicted and migrated to the new nodes.

The rational behind wanting both scale to zero and highly available services while active is that the type app that is running in the service is controlled by our customers and we can't easily know which services are pre-production and non important and which ones are critical and must be highly available.

The solution that we would like to implement is:

Add pod disruption budgets for our Knative services with minAvailable: 1
Set min-scale: 2 so that our Knative services have a minimum of 2 replicas when they're running

Unfortunately however when we set min-scale: 2 this results in all of our Knative services scaling up to a minimum of 2 pods, including all of them that had been scaled down to zero.

We did some testing with using activation-scale but it doesn't solve the problem as the service can scale down to 1 replica when its active if it doesn't get enough request concurrency after initially activating and scaling up to 2. The description of the PR that was merged seems to indicate that it should work like we want it however, but it doesn't. #13136

Would it be possible to add another annotation that can fulfil the description of #13136 rather than activation-scale like min-scale-while-active?

As an alternative we're currently thinking of building a controller than will temporarily increase the min-scale of our active services to 2 when an upgrade is occurring. Curious if there is some other solution or workaround that you could recommend instead of this approach?

The text was updated successfully, but these errors were encountered:

skonto · 2024-05-14T15:52:45Z

Hi @daraghlowe, I will take a look on what you report about activation-scale and will get back to you.

skonto · 2024-05-16T15:03:53Z

Hi @daraghlowe

We did some testing with using activation-scale but it doesn't solve the problem as the service can scale down to 1 replica when its active if it doesn't get enough request concurrency after initially activating and scaling up to 2. The description of the PR that was merged seems to indicate that it should work like we want it however, but it doesn't.

According to the docs the behavior is:

This value controls the minimum number of replicas that will be created when the Revision scales up from zero. After the Revision has reached this scale one time, this value is ignored. This means that the Revision will scale down after the activation scale is reached if the actual traffic received needs a smaller scale.

Also in the PR: "This annotation will not impact initial-scale values, as it will only apply on subsequent scales from zero."
Now in the code we have:

if a.deciderSpec.ActivationScale > 1 {
  logger.Debug("Considering Activation Scale")
  if dspc > 0 && a.deciderSpec.ActivationScale > desiredStablePodCount {
  ...

if dspc is > 0 due to traffic come in and also revision is active I am wondering why you don't see two pods.
I did try it and don't see less pods, using:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: autoscale-go
  namespace: default
spec:
  template:
    metadata:
      annotations:
        # Target 10 in-flight-requests per pod.
        autoscaling.knative.dev/target: "10"
        autoscaling.knative.dev/activation-scale: "2"
        autoscaling.knative.dev/target-burst-capacity: "10"
    spec:
      containers:
      - image: ghcr.io/knative/autoscale-go:latest

Could you enable debug logging for the autoscaler and paste the output also provide more details like the ksvc you used?

cc @psschwei @dprotaso if they have more ideas.

daraghlowe added the kind/feature Well-understood/specified features, ready for coding. label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow scale to zero to work when min-scale is greater than 0 #15154

Allow scale to zero to work when min-scale is greater than 0 #15154

daraghlowe commented Apr 22, 2024

skonto commented May 14, 2024

skonto commented May 16, 2024 •

edited

Allow scale to zero to work when min-scale is greater than 0 #15154

Allow scale to zero to work when min-scale is greater than 0 #15154

Comments

daraghlowe commented Apr 22, 2024

What feature do you want?

Describe the feature

skonto commented May 14, 2024

skonto commented May 16, 2024 • edited

skonto commented May 16, 2024 •

edited