Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic locality loadbalancer behaviour with hpa #50727

Open
ramaraochavali opened this issue Apr 29, 2024 · 15 comments
Open

dynamic locality loadbalancer behaviour with hpa #50727

ramaraochavali opened this issue Apr 29, 2024 · 15 comments

Comments

@ramaraochavali
Copy link
Contributor

ramaraochavali commented Apr 29, 2024

Describe the feature request
When Locality load balancer with HPA and a particular zone is flooded with requests, HPA kicks in and brings up a new pod - As HPA is not aware of zone affinity, this new pod can come in any zone. The problem with this is the current zone where there are more requests are handled by the same set of pods that existed prior to HPA bringing this new Pod. If that specific zone, continues to get same high number of requests for some period of time, this gets in to a loop and the pods get scheduled in another zones (sitting almost idle) but the current zone continues to suffer with latency issues + request failures. The traffic does not go to other zones because the current zone pods (however small in number they are, are still healthy).

Ideally when this situation happens, it is better to go cross zone and satisfy requests similar to how we spillover to other zones when the current zone pods are unhealthy.

The proposal is to introduce two new flags in the locality load balancer setting that automatically switches off locality load balancer when there is a skew in the pod distribution across zones

disable_if_skewed - Default to false. If true, disables locality load balancer when it detects are skew in pod distribution across zones
skew_factor - Defaults to zero. If specified it determines how/when we can trigger disable_if_skewed. Skew is the difference in the number of pods between the most populated and least populated Availability Zone. For example if the value of skew_factor is 2, it means the difference between most populated AZ and least populated Az should be two pods.

Describe alternatives you've considered

There are not current alternatives, except for spinning a different deployment for each zone which complicates the setup.

Affected product area (please put an X in all that apply)

[ ] Ambient
[ ] Docs
[ ] Dual Stack
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Affected features (please put an X in all that apply)

[ ] Multi Cluster
[ ] Virtual Machine
[ ] Multi Control Plane

Additional context

@ramaraochavali
Copy link
Contributor Author

Discussed offline very briefly with @howardjohn .

@hzxuzhonghu WDYT?

@howardjohn
Copy link
Member

How is "skewed" determined?

Say I have 3 zones. If I have 3/3/3 pods in each, is it "skewed"? I am only sending to 1/3 of the pods otherwise.

Is it 1/4/4? 1/1/7?

What if I am 3/3/3 but all the clients happen to be in one zone?

@ramaraochavali
Copy link
Contributor Author

ramaraochavali commented Apr 29, 2024

How is "skewed" determined?

It is determined based on skew_factor if it is set to 2, if any zone has > 2x pods compared to any other zone, it is considered skewed

What if I am 3/3/3 but all the clients happen to be in one zone?

If your 3 pods can handle all client requests happily - no change. If they can not handle and hpa is triggered , if hpa results in skew - we get in to disable locality mode

@howardjohn
Copy link
Member

The thing that is confusing to me.

Why is 3/3/3 ok, but 3/6/0 we should start change?

From the local zone its the same - 3 pods (33%) get traffic

@ramaraochavali
Copy link
Contributor Author

Why is 3/3/3 ok, but 3/6/0 we should start change?

That is a good point. The fact it went from even distribution(3/3/3/) assuming original distribution was good to 3/6/0 is a "hint" that tells us some thing changed via hpa. I do not know if we can compare the previous vs. current distribution to identify the skew. I know it is not perfect but my idea is to create this issue so that we can discuss and see if we can solve this.

@howardjohn
Copy link
Member

I feel like this is not the responsibility of a load balancer TBH, but of the scheduler to schedule pods where they are required

@hzxuzhonghu
Copy link
Member

I agree with john's point here. It is the scheduler not loadbalancer who should be in charge of this. LB is working as expected. Yes, sure it lacks the capability to be aware of the server load. There is a issues tracking this in envoy i think.

@ramaraochavali
Copy link
Contributor Author

I agree with the point that it is schedulers responsibility. But given how HPA works, I am trying to see if Load Balancer can be intelligent to handle this case.

Yes, sure it lacks the capability to be aware of the server load.

It is not just server load but combination of server load + the zone in which it is scheduled. So similar to how we fallback to other regions when all endpoints are unhealthy, I think it would be good to have a mechanism in Load Balancer that spills over to other zones if the current zone's pods are overloaded (not just unhealthy). Ofcourse the solution I proposed just assumes "skew" as an indicator for overload which is not correct in all cases.

Can you point me to envoy issue if you have it handy?

@howardjohn
Copy link
Member

The concern I have is I cannot come up with a reasonable algorithm for when we should starting spilling over due to skew that solves your use case, isn't stateful, and isn't just "round robin"..

@ramaraochavali
Copy link
Contributor Author

I do not think switching to round robin would help unless we disable locality load balancer unless I am missing some thing in your proposal.

@hzxuzhonghu
Copy link
Member

@ramaraochavali envoyproxy/envoy#6614 If you want more intelligent lb, this is the right requirement

@ramaraochavali
Copy link
Contributor Author

I do not think it will work when locality load balancer is enabled. When locality load balancer is enabled, we pick nodes based on priority first and once priority is picked we apply the load balancer (least request, round robin, cost aggregated etc). Does not it pick nodes in the same zone and apply this cost logic?

@hzxuzhonghu
Copy link
Member

Sure, locality load balancer collaborate with some other unimplemented algorithms could solve this case

@ramaraochavali
Copy link
Contributor Author

I think the key is to disable locality load balancer or make it behave in away to spill over traffic to other zones when we detect some abnormality in pod scheduling/load if enabled. How can we correctly detect is the question

@ramaraochavali
Copy link
Contributor Author

BTW, the skew above is similar to how k8s evaluates max_skew in PodTopologyConstraints. This is an interesting article on how even with PodTopologyConstraints can result in skew https://medium.com/wise-engineering/avoiding-kubernetes-pod-topology-spread-constraint-pitfalls-d369bb04689e - especially during scale down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants