You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to be able to set min-scale while active to 2 so that our active Knative services are high available and still allow scale to zero so we're not wasting compute resources when our Knative services are not receiving requests
Describe the feature
We're currently running our Knative services without min-scale set and we allow the services to scale down to zero when they're not actively receiving requests. This obviously saves us a lot in ensuring we're not wasting compute resources and is a feature of Knative that we want to continue utilising.
In addition to scale to zero, we also want our Knative services to be highly available when they are receiving requests and are active. Specifically we have a problem when our node pool upgrades happen that any of the services running a single replica will experience downtime while the pods are evicted and migrated to the new nodes.
The rational behind wanting both scale to zero and highly available services while active is that the type app that is running in the service is controlled by our customers and we can't easily know which services are pre-production and non important and which ones are critical and must be highly available.
The solution that we would like to implement is:
Add pod disruption budgets for our Knative services with minAvailable: 1
Set min-scale: 2 so that our Knative services have a minimum of 2 replicas when they're running
Unfortunately however when we set min-scale: 2 this results in all of our Knative services scaling up to a minimum of 2 pods, including all of them that had been scaled down to zero.
We did some testing with using activation-scale but it doesn't solve the problem as the service can scale down to 1 replica when its active if it doesn't get enough request concurrency after initially activating and scaling up to 2. The description of the PR that was merged seems to indicate that it should work like we want it however, but it doesn't. #13136
Would it be possible to add another annotation that can fulfil the description of #13136 rather than activation-scale like min-scale-while-active?
As an alternative we're currently thinking of building a controller than will temporarily increase the min-scale of our active services to 2 when an upgrade is occurring. Curious if there is some other solution or workaround that you could recommend instead of this approach?
The text was updated successfully, but these errors were encountered:
We did some testing with using activation-scale but it doesn't solve the problem as the service can scale down to 1 replica when its active if it doesn't get enough request concurrency after initially activating and scaling up to 2. The description of the PR that was merged seems to indicate that it should work like we want it however, but it doesn't.
This value controls the minimum number of replicas that will be created when the Revision scales up from zero. After the Revision has reached this scale one time, this value is ignored. This means that the Revision will scale down after the activation scale is reached if the actual traffic received needs a smaller scale.
Also in the PR: "This annotation will not impact initial-scale values, as it will only apply on subsequent scales from zero."
Now in the code we have:
if a.deciderSpec.ActivationScale > 1 {
logger.Debug("Considering Activation Scale")
if dspc > 0 && a.deciderSpec.ActivationScale > desiredStablePodCount {
...
if dspc is > 0 due to traffic come in and also revision is active I am wondering why you don't see two pods.
I did try it and don't see less pods, using:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: autoscale-go
namespace: default
spec:
template:
metadata:
annotations:
# Target 10 in-flight-requests per pod.
autoscaling.knative.dev/target: "10"
autoscaling.knative.dev/activation-scale: "2"
autoscaling.knative.dev/target-burst-capacity: "10"
spec:
containers:
- image: ghcr.io/knative/autoscale-go:latest
Could you enable debug logging for the autoscaler and paste the output also provide more details like the ksvc you used?
What feature do you want?
We want to be able to set min-scale while active to 2 so that our active Knative services are high available and still allow scale to zero so we're not wasting compute resources when our Knative services are not receiving requests
Describe the feature
We're currently running our Knative services without min-scale set and we allow the services to scale down to zero when they're not actively receiving requests. This obviously saves us a lot in ensuring we're not wasting compute resources and is a feature of Knative that we want to continue utilising.
In addition to scale to zero, we also want our Knative services to be highly available when they are receiving requests and are active. Specifically we have a problem when our node pool upgrades happen that any of the services running a single replica will experience downtime while the pods are evicted and migrated to the new nodes.
The rational behind wanting both scale to zero and highly available services while active is that the type app that is running in the service is controlled by our customers and we can't easily know which services are pre-production and non important and which ones are critical and must be highly available.
The solution that we would like to implement is:
minAvailable: 1
min-scale: 2
so that our Knative services have a minimum of 2 replicas when they're runningUnfortunately however when we set
min-scale: 2
this results in all of our Knative services scaling up to a minimum of 2 pods, including all of them that had been scaled down to zero.We did some testing with using
activation-scale
but it doesn't solve the problem as the service can scale down to 1 replica when its active if it doesn't get enough request concurrency after initially activating and scaling up to 2. The description of the PR that was merged seems to indicate that it should work like we want it however, but it doesn't. #13136Would it be possible to add another annotation that can fulfil the description of #13136 rather than activation-scale like
min-scale-while-active
?As an alternative we're currently thinking of building a controller than will temporarily increase the min-scale of our active services to 2 when an upgrade is occurring. Curious if there is some other solution or workaround that you could recommend instead of this approach?
The text was updated successfully, but these errors were encountered: