Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement locality/priority based load balancing #5610

Merged
merged 13 commits into from
Jun 10, 2024

Conversation

jrhee17
Copy link
Contributor

@jrhee17 jrhee17 commented Apr 15, 2024

Motivation:

This pull request attempts to implement the default EdfLoadBalancerBase which is the default load balancer used in envoy. Most load balancer implementations in envoy derive from this base class.

The basic load balancer supports locality weighted load balancing and priority levels.

Note that not all functionality (namely zone aware load balancing, etc..) have been implemented for simplicity.
Additionally, SubsetLoadBalancer has not been implemented in this PR for easier reviewing. The previous SubsetLoadBalancer will be replaced in the next (and final) pull request.

This pull request mostly focuses on updating the LoadBalancer state when a ClusterSnapshot is updated.
The flow is as follows:

  1. ClusterEntry#accept is called, which indicates a ClusterSnapshot has been updated
  2. A PriorityStateManager computes endpoints per priority and locality.
  1. A PrioritySet is created. This PrioritySet contains a map of Integer -> HostSet where a HostSet contains host information for each health/locality.
  1. A DefaultLbStateFactory is created which creates a LbState for the DefaultLoadBalancer. A LbState is a convenience object used to avoid potential race conditions. The LbState is replaced atomically from the perspective of DefaultLoadBalancer.
  1. Lastly, the load balancer chooses a HostSource, and subsequently a host from the selected HostSet

ref: #5450

Modifications:

  • ClusterEntry now creates a PrioritySet for each ClusterSnapshot update
    • In order to create a PrioritySet, utility classes PriorityStateManager, PriorityState, HostSet, UpdateHostsParam have been created.
    • Utility classes EndpointUtil, EndpointGroupUtil have been created
  • DefaultLoadBalancer has been introduced. The functionality of this LoadBalancer is equivalent to EdfLoadBalancer.
  • DefaultLbStateFactory has been introduced to facilitate creating a state for DefaultLoadBalancer. The LbState created by DefaultLbStateFactory is intended to be updated atomically from the perspective of DefaultLoadBalancer.

Result:

  • Priority and locality based load balancing is now available for XdsEndpointGroup

@jrhee17 jrhee17 added this to the 1.29.0 milestone Apr 15, 2024
@jrhee17 jrhee17 marked this pull request as ready for review April 15, 2024 07:39
Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good! 👍👍

return null;
}
if (!prioritySet.hostSets().containsKey(hostsSource.priority)) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) With the current implementation, does hostsSource.priority always exist so we can't reach here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let me just throw an exception here to make this clearer 👍


static int hash(ClientRequestContext ctx) {
if (ctx.hasAttr(XdsAttributesKeys.SELECTION_HASH)) {
return ctx.attr(XdsAttributesKeys.SELECTION_HASH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question)

  • Could we ensure that the value set with XdsAttributesKeys.SELECTION_HASH is a non-negative number?
  • Where does the value of SELECTION_HASH come from? I checked that it is used for testing in this PR. Will it be fetched from a control plane later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the value of SELECTION_HASH come from? I checked that it is used for testing in this PR. Will it be fetched from a control plane later?

I believe this is only used from tests in upstream as well. I think it's fine to remove this attribute if it is too confusing. Let me know what you think

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent. Thanks a lot!
I left a few suggestions, mostly nits and style issues.

I also noticed that there are no logs at all. If everything goes well, it shouldn't be a problem. However, if something goes wrong, troubleshooting will be very difficult. Could you add trace or debug logs where you think they are necessary?

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent. Thanks a lot!
I left a few suggestions, mostly nits and style issues.

I also noticed that there are no logs at all. If everything goes well, it shouldn't be a problem. However, if something goes wrong, troubleshooting will be very difficult. Could you add trace or debug logs where you think they are necessary?

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent. Thanks a lot!
I left a few suggestions, mostly nits and style issues.

I also noticed that there are no logs at all. If everything goes well, it shouldn't be a problem. However, if something goes wrong, troubleshooting will be very difficult. Could you add trace or debug logs where you think they are necessary?

@jrhee17
Copy link
Contributor Author

jrhee17 commented Jun 4, 2024

Could you add trace or debug logs where you think they are necessary?

Good point. Added logs in the PrioritySet update path, and added more information to the exceptions being thrown when selecting an endpoint.

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments. Looks good!

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments. Looks good!

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way to go, @jrhee17! 👍

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @jrhee17! Thanks a lot. 👍 👍 👍

@minwoox minwoox merged commit b47d6c9 into line:main Jun 10, 2024
14 of 15 checks passed
@minwoox
Copy link
Member

minwoox commented Jun 10, 2024

👍 👍 👍 👍 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants