-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try spread pods in HA to different nodes on single zone deployments #7666
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Filinto Duran <filinto@diagrid.io>
d501c04
to
c1370dc
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7666 +/- ##
==========================================
+ Coverage 57.04% 61.83% +4.79%
==========================================
Files 480 245 -235
Lines 25982 22418 -3564
==========================================
- Hits 14822 13863 -959
+ Misses 9982 7393 -2589
+ Partials 1178 1162 -16 ☔ View full report in Codecov by Sentry. |
@filintod, the kube-scheduler already scores based on Node locality for a replica set. Setting this value to me seems like we are just adding more processing time to the scheduling process. Same with the exiting zone affinity rule- do we actually need this? |
@JoshVanL was seeing some pods for the same dapr system (ie dapr-sentry) to be running on the same nodes in some customers, even though it could also be there was a single node, but seem strange for prod systems. in relation to the zone anti-affinity, that was already there, it is kind of a common best practice for high availability, but more for stateful systems where you would care more about a whole zone going down, so probably not needed for all dapr systems (ie stateless). On the other hand if your load is spread in different zones, you might also get smaller latency talking to your in zone application, but latency should not be that big inte-zone but it is another consideration. |
@filintod I think we should do some experimenting with also removing the zone affinity rules. Scheduler also takes this into account by default. By adding custom rules we are causing changes to the default scoring which might be having unintended consequences and fighting sane defaults. I would have thought having better pod priority to be the thing to focus on to ensure uptime of the control plane. |
one thing is that you cannot say HA and have things running on the same node, and in some way for multi-zone, I need to check more on the score nowadays to see how weight is given to each |
I said uptime not HA 🙂 I see these as two separate things where uptime is a subset property of achieving HA, but HA also incorporating the idea of replication. The Dapr control plane is not sensitive to churn like an application serving business traffic needing network failover would be. Single replica Dapr control plane can handle plenty, it just needs to be up, somewhere, and needs have higher priority of being up over consuming Dapr apps. |
yes, I meant for Uptime is for sure important and we should find ways to make these services increase in priority for when one of them is going to be kick out to not be the one of the last in the list. but probably different to this PR. |
btw, actually thought I had seen the replica set node spread locality you mentioned but actually it was different. https://kubernetes.io/docs/reference/scheduling/config/ . So lmk if you can point to me where you see it. From the default ones, you could get balance if all nodes have somewhat balanced load allocatable, but overtime that is usually not feasible without some sort of rebalancing (that there are tools around like descaler). If that is the case, we do need to have antiaffinity to tilt the balance on not having all pods on the same node if possible. |
Yeah, it is not be done by pod affinity. Maybe we can add a check in the ready endpoint in the sidecar to ensure daprd is running. |
Description
Add soft antiaffinity with topology key
kubernetes.io/hostname
with a weight of 50 to still give more priority to zone antiaffinity if multiple zones availableIssue reference
Please reference the issue this PR will close: #7665
Checklist
Please make sure you've completed the relevant tasks for this PR, out of the following list: