Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D2 client side implementation of server-reported health #617

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rachelhanhan
Copy link
Contributor

Implemented the following components in the D2 client side:

  1. Read the server-reported load from the TrackerClient through response wire attributes. If it's not populated, we use -1 as the default.
  2. Added 2 relative threshold for the server load.
  3. Added one client-side config to enabled the client-side load balancing using server reported load.

Copy link
Contributor

@rickzx rickzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress! Did a first pass review. I think we can discuss more on up/downStep model vs. P2C. Sticky routing is currently best effort. It's not guaranteed to reach the same host when the pointsMap changes anyways. I still think that 5s maybe too long for server reported load to be effective.

@@ -322,22 +327,38 @@ public void record()
@Override
public void endCall()
{
endCall(false, null);
endCall(Collections.emptyMap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just pass the server reported score to endCall() instead of the entire wireAttributes map? It seems that only private void endCall(boolean hasError, ErrorType errorType, int serverLoadScore) uses wireAttributes. Also it prevents creating an empty map for each endCall() invocation.


/**
* If the latest server reported load score is under this specified factor of the cluster average,
* we will increase the health score by upStep.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have more discussion on up/downStep model vs. P2C. Sticky routing is currently best effort. It's not guaranteed to reach the same host when the pointsMap changes. I still think that 5s maybe too long for server reported load to be effective.

@@ -192,11 +192,11 @@ public void onResponse(TransportResponse<RestResponse> response)
if (response.hasError())
{
Throwable throwable = response.getError();
handleError(_callCompletion, throwable);
handleError(_callCompletion, throwable, response.getWireAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to parse server load from wireAttributes when enableServerReportedLoad is set to false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 places that we can potentially control the behavior on the client side:

  1. TrackerClient - whether to pass wire attributes
  2. CallTracker - whether to update the server-reported load in each interval
  3. load balancer strategy - whether to take the server-reported load into load balancing decision.

When I designed this I also debated where I should put the control, in the end I only put one control in load balancer strategy to make the control logic all in one place.

Now TrackerClient and CallTracker just update the server reported load whenever it's ready, and in the load balancing strategy there are 2 cases we need to watch out:

  1. Server enables reporting, but client did not enable, which we respect the client side config flag
  2. Server does not enable yet, but client enables, which means the reported load will always be -1. Either case, we control the RLB to not consider server reported load.

There is definitely a way to add control in TrackerClient, where we can always report -1 as the score. There will be more code we need to cleanup later if we add a control here. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed this logic in my latest commit. I guarded the code with the config on both TrackerClient and the hash ring selector logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants