Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow scale to zero to work when min-scale is greater than 0 #15154

Open
daraghlowe opened this issue Apr 22, 2024 · 2 comments
Open

Allow scale to zero to work when min-scale is greater than 0 #15154

daraghlowe opened this issue Apr 22, 2024 · 2 comments
Labels
kind/feature Well-understood/specified features, ready for coding.

Comments

@daraghlowe
Copy link

What feature do you want?

We want to be able to set min-scale while active to 2 so that our active Knative services are high available and still allow scale to zero so we're not wasting compute resources when our Knative services are not receiving requests

Describe the feature

We're currently running our Knative services without min-scale set and we allow the services to scale down to zero when they're not actively receiving requests. This obviously saves us a lot in ensuring we're not wasting compute resources and is a feature of Knative that we want to continue utilising.

In addition to scale to zero, we also want our Knative services to be highly available when they are receiving requests and are active. Specifically we have a problem when our node pool upgrades happen that any of the services running a single replica will experience downtime while the pods are evicted and migrated to the new nodes.

The rational behind wanting both scale to zero and highly available services while active is that the type app that is running in the service is controlled by our customers and we can't easily know which services are pre-production and non important and which ones are critical and must be highly available.

The solution that we would like to implement is:

  • Add pod disruption budgets for our Knative services with minAvailable: 1
  • Set min-scale: 2 so that our Knative services have a minimum of 2 replicas when they're running

Unfortunately however when we set min-scale: 2 this results in all of our Knative services scaling up to a minimum of 2 pods, including all of them that had been scaled down to zero.

We did some testing with using activation-scale but it doesn't solve the problem as the service can scale down to 1 replica when its active if it doesn't get enough request concurrency after initially activating and scaling up to 2. The description of the PR that was merged seems to indicate that it should work like we want it however, but it doesn't. #13136

Would it be possible to add another annotation that can fulfil the description of #13136 rather than activation-scale like min-scale-while-active?

As an alternative we're currently thinking of building a controller than will temporarily increase the min-scale of our active services to 2 when an upgrade is occurring. Curious if there is some other solution or workaround that you could recommend instead of this approach?

@daraghlowe daraghlowe added the kind/feature Well-understood/specified features, ready for coding. label Apr 22, 2024
@skonto
Copy link
Contributor

skonto commented May 14, 2024

Hi @daraghlowe, I will take a look on what you report about activation-scale and will get back to you.

@skonto
Copy link
Contributor

skonto commented May 16, 2024

Hi @daraghlowe

We did some testing with using activation-scale but it doesn't solve the problem as the service can scale down to 1 replica when its active if it doesn't get enough request concurrency after initially activating and scaling up to 2. The description of the PR that was merged seems to indicate that it should work like we want it however, but it doesn't.

According to the docs the behavior is:

This value controls the minimum number of replicas that will be created when the Revision scales up from zero. After the Revision has reached this scale one time, this value is ignored. This means that the Revision will scale down after the activation scale is reached if the actual traffic received needs a smaller scale.

Also in the PR: "This annotation will not impact initial-scale values, as it will only apply on subsequent scales from zero."
Now in the code we have:

if a.deciderSpec.ActivationScale > 1 {
  logger.Debug("Considering Activation Scale")
  if dspc > 0 && a.deciderSpec.ActivationScale > desiredStablePodCount {
  ...

if dspc is > 0 due to traffic come in and also revision is active I am wondering why you don't see two pods.
I did try it and don't see less pods, using:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: autoscale-go
  namespace: default
spec:
  template:
    metadata:
      annotations:
        # Target 10 in-flight-requests per pod.
        autoscaling.knative.dev/target: "10"
        autoscaling.knative.dev/activation-scale: "2"
        autoscaling.knative.dev/target-burst-capacity: "10"
    spec:
      containers:
      - image: ghcr.io/knative/autoscale-go:latest

Could you enable debug logging for the autoscaler and paste the output also provide more details like the ksvc you used?

cc @psschwei @dprotaso if they have more ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Well-understood/specified features, ready for coding.
Projects
None yet
Development

No branches or pull requests

2 participants