Make loadbalancer avoid sending requests to unhealthy nodes #2363

RomanZavodskikh · 2023-06-01T16:57:55Z

Updates #2346

szuecs · 2023-06-01T17:14:19Z

proxy/proxy.go

@@ -870,6 +870,11 @@ func (p *Proxy) makeBackendRequest(ctx *context, requestContext stdlibcontext.Co
 	req = injectClientTrace(req, ctx.proxySpan)

 	response, err := roundTripper.RoundTrip(req)
+	if response.StatusCode >= 400 {


this we can't do on >=400. We need to check the errors that return timeouts and connection errors.
Everything that is a valid response we do accept as "success".
So a 500 could be a "success". I think you could start with err != nil as "fail".

I agree with the fact error code >=400 is not what we want because 4xx is a valid response, its client side error. But 5xx even being a valid response means something is up with the backend, 5xx are always server side errors. So maybe something like err != nil || esponse.StatusCode >= 500 would be better?

szuecs · 2023-06-01T17:18:26Z

routing/routing.go

+	defer m.m.Unlock()
+
+	m.failedRequests++
+	m.totalRequests++


why not

atomic.AddInt64(&m.totalRequests, 1) atomic.AddInt64(&m.failedRequests, 1)

Because in this section of code m.m is locked, so we can rely on no other goroutine using those fields.
So both usual and atomic instructions are correct.
However, atomic ones are slower, because they add mfence instructions, like here https://stackoverflow.com/questions/51846894/what-is-the-performance-of-stdatomic-vs-non-atomic-variables

I'm not sure if this post is the same case we are seeing here. They are explicitly comparing a normal operation vs an atomic one, but the performance constraints also involve the mutex lock. Atomic instructions are busy waiting, mutex free up CPU for other threads. I can't tell one or the other is more efficient for our use case, but our use case seems different from the question in your link. Or I might be imagining things, either way its a valid discussion. As food for thought: https://stackoverflow.com/questions/15056237/which-is-more-efficient-basic-mutex-lock-or-atomic-integer

I swapped the lines, because you want to add total before failed, such that you do not count failed more than total (could lead to failed is bigger than total).
In general I am sure that lock will be slower as @lucastt summarized. If you like to check it, create a micro benchmark. Do not trust random posts by people, even if this would be super-hero-popular-10x-coder, it can be wrong.

My intuition tells me that atomics are faster in this case. :-)
In general, I would like to write benchmark to prove it, if I will have time, will do this.
It I won't, will think about atomics as a baseline.

Make sure you increase first total requests and after that the failed ones.

szuecs · 2023-06-01T17:18:40Z

routing/routing.go

+	m.m.Lock()
+	defer m.m.Unlock()
+
+	m.totalRequests++


why not

atomic.AddInt64(&m.totalRequests, 1)

Because in this section of code m.m is locked, so we can rely on no other goroutine using those fields.
So both usual and atomic instructions are correct.
However, atomic ones are slower, because they add mfence instructions, like here https://stackoverflow.com/questions/51846894/what-is-the-performance-of-stdatomic-vs-non-atomic-variables

I guess we discuss it in the comment above #2363 (comment)

lucastt · 2023-06-02T08:59:31Z

loadbalancer/algorithm.go

@@ -208,7 +226,9 @@ func (r *random) Apply(ctx *routing.LBContext) routing.LBEndpoint {
 		return ctx.Route.LBEndpoints[i]
 	}

-	return withFadeIn(r.rand, ctx, r.notFadingIndexes, i, r)
+	i = withFadeIn(r.rand, ctx, r.notFadingIndexes, i, r)
+	i = withHealthCheck(r.rand, ctx, r.notFadingIndexes, i, r)


I wonder if it would not be better to actually have a new LB algorithm instead of modify random. Since you avoid unhealthy nodes it is not exactly random anymore, it might be a bit misleading to a user configuring it.

This same thing could be applied to round robin also. Instead of a whole LB algorithm this could be just an option to both algorithms and configurable through some flag or parameter.

lucastt · 2023-06-02T12:26:34Z

loadbalancer/algorithm.go

+	now := time.Now()
+	failedReqs, totalReqs := ctx.Route.LBEndpoints[choice].Metrics.GetFailedRequests()
+
+	if totalReqs == 0 || float64(failedReqs)/float64(totalReqs) < 0.95 {


This behavior is ok, but looks more like a CB than a admission control behavior. I might be reading this wrong but what I understand here is: Once a backend reaches 95% error rate we just stop sending requests to it until there is no active requests to this backend, then we try again and if the backend shows apparent recover we keep sending requests to it, if it reaches 95% error rate again we stop and so on.

First I don't have big problems with this behavior but from the issue description what I would expect is load being drained from this backend until we reach a point where the backend reach a equilibrium and is able to serve the requests sent to it. This is much less aggressive behavior too. And is a admission control like behavior.

Have you thought about the pros and cons of these two approaches? If yes, would you mind describing then a bit for me to understand the reasoning?

From a quick look my thinking is: in case of network partition the CB behavior would be better because it acts much faster, but in case of overload admission control seems to fare better since it will not just ditch the load of a overloaded backend to other backends that might be struggling or not. Basicaly in the overload case CB behavior could spread the failure to other backends.

If there is a case that this behaviour could be a choice for the user it might make sense to make it configurable through some flag or parameter.

Thanks for feedback Lucas! The very first thing to do from my PoV is to split withFadeIn and withHealthCheck parts to make them not so coupled together.
But from the first glance I would like AC (not CB) to be here, because I would like some requests to go to unhealthy node just to find out the moment when it will be healthy again.

We need to have a strategy pattern somewhere.
I think what we want is:

probability based drops (or better re-schedule), similar to admissionControl reject calculation to allow small amount of traffic to broken endpoints to observe PHC (passive health check)

exponential backoff (CB style)

Updates #2363 Signed-off-by: Roman Zavodskikh <roman.zavodskikh@zalando.de>

loadbalancer/algorithm.go

RomanZavodskikh · 2023-06-08T11:29:25Z

loadbalancer/algorithm.go


-	if ctx.Route.LBFadeInDuration <= 0 {
-		return ctx.Route.LBEndpoints[r.index]
+	endpoints := []routing.LBEndpoint{}


Can I avoid repetitions (this looks the same as in random and almost the same as in consistent-hast)?

RomanZavodskikh · 2023-06-08T11:53:13Z

loadbalancer/algorithm.go

 	}
+	goodHealthyEp := totalReqs <= 10 || float64(failedReqs)/float64(totalReqs) < 0.95 || rnd.Float64() < 0.05


Such constants look like good baseline values to me. How can I make them configurable in a good way?

I guess we would have to store them in a new type and you have no "pure" func, but a "method".
Then you would need to create a constructor called from proxy or skipper and pass through the config.

szuecs · 2023-06-08T12:02:26Z

loadbalancer/algorithm.go

-		if _, fadingIn := fadeInState(now, ctx.Route.LBFadeInDuration, ep[i].Detected); !fadingIn {
-			notFadingIndexes = append(notFadingIndexes, i)
-		}
+	failedReqs, totalReqs := (int64)(0), (int64)(0)


why not use the defaults:

var failedReqs, totalReqs int64

szuecs · 2023-06-08T12:27:57Z

loadbalancer/algorithm.go

-	if ctx.Route.LBFadeInDuration <= 0 {
-		return ctx.Route.LBEndpoints[r.index]
+	endpoints := []routing.LBEndpoint{}
+	for _, e := range ctx.Route.LBEndpoints {


Apply is in the hotpath and it feels like instead of an index access to slice we have now a loop through all endpoints, so from O(1) to O(n). That sounds not very good. Do I miss something?

so from O(1) to O(n)

Indeed.

I feel like there is two options here, one is to take this off of the hot path, being updated independently through some go routine or access the slice the same way, but if the endpoint is not healthy we shift to the next and do the same evaluation. The second one has to have a stop condition to avoid unending loops. I personally think it would be better to have this in a separate go routine that updates this slice every x seconds.

lucastt · 2023-06-08T13:29:14Z

loadbalancer/algorithm.go

+			endpoints = append(endpoints, e)
+		}
+	}
+	if len(endpoints) == 0 {


What will happen if a very small number of endpoints is healthy? Shouldn't we have some kind of guardrail for this? In this case there is two possibilities:

This algorithm sheds some load;

This algorithm does not mind if skipper overload some healthy backends;

The first option is kinda breaking the separation of concerns of these features, I don't think an LB algorithm should shed load, it seems a bit out of place and this is not what the author chose to do, which I agree.

But there is some concern that if very few endpoints are healthy this would make the system collapse. So maybe this algorithm should only be available to routes with some form of load shedding strategy configured. I don't have a well formed opinion about it, what do you think?

Well, at least we should document this risk very well.

lucastt · 2023-06-08T13:50:01Z

loadbalancer/fadein_test.go

 			t.Log("CSV " + fmt.Sprintf("%d,", i) + strings.Join(showStats, ","))
 		}
 	})
 }

-// For each endpoint, return a hash key which will make the consistent hash algorithm select it.


Why is this test not valid anymore?

After random keys are used here
#2363 (comment)
this function is not called nowhere.

lucastt · 2023-06-08T13:52:31Z

loadbalancer/fadein_test.go

 		func() {
 			for {
-				ctx.Params[ConsistentHashKey] = hashKeys[len(stats)%len(hashKeys)]
+				ctx.Params[ConsistentHashKey] = strconv.Itoa(rnd.Intn(100000))


This I also don't understand, why do we have to change this behavior from iterative to random? Shouldn't it be interchangeable?

The old version of test code creates the limited amount of keys: one key for one host, so basically in case of 6 hosts (3 "old" ones: a, b, c and 3 "fading-in": d, e, f) this code will send request with the following key values in the loop:

h_a h_b h_c h_d h_e h_f h_a h_b ... (~100k reqs for single test)

As in the beginning of this test, nodes d, e, f process very few amount of requests (like here #2363 (comment)), instead such requests are redirected to the corresponding host in the consistent hash circle (https://www.toptal.com/big-data/consistent-hashing).
For example, requests to d may be redirected to e, e - to f, f - to a.
In the old version this corresponding is constant, because the same six keys are used, so there is high chance that host d will receive not so many requests.

So, the solution I propose here is to use different keys for the forementioned redirection to be not constant, but random instead.

This PR tends to be too huge, moved this part to another PR: #2380

lucastt · 2023-06-08T13:54:51Z

loadbalancer/fadein_test.go

@@ -12,7 +12,7 @@ import (
 )

 const (
-	fadeInDuration    = 100 * time.Millisecond
+	fadeInDuration    = 500 * time.Millisecond


This is a rather specific change, why is 500ms better than 100ms here? Is this for visualization reasons? Because to me it should work independently of the fadeInDuration here.

No, visualization works fine regardless of fadeInDuration. Such changes are needed for tests like TestFadeIn/consistent-hash,_5 (in loadbalancer/fadein_test.go) because in the beginning of it there are some hosts are being "warmed up" and therefore they receive not so many requests (ones or tens) and so there're good changes that by some random reason the number of requests will go down by some little (ones or tens) amount.

Basically, this test checks that number of requests sent to every host is monotonic (decreases for old hosts, increases for fading-in hosts), so basically for fading-in hosts the condition looks like "for every period of time the number of requests in the next period of time should be no less that 0.6*number of requests in the current period". When the number of requests is small, some random reason (in the Go scheduler or random generator, for example) may easily make this condition false even for rightly implemented load balancer.

That's why I increase fadeInDuration: to increase the time window during which reqs are collected, so increase the number of requests in every specific moment of time, so decrease the influence of random events here.

This PR tends to be too huge, moved this part to another PR: #2380

Updates #2346 Signed-off-by: Roman Zavodskikh <roman.zavodskikh@zalando.de>

RomanZavodskikh added the do-not-merge label Jun 1, 2023

szuecs reviewed Jun 1, 2023

View reviewed changes

lucastt reviewed Jun 2, 2023

View reviewed changes

RomanZavodskikh force-pushed the admissionControlUnhealthyNodes branch 6 times, most recently from e096d3f to 63c9ea0 Compare June 5, 2023 15:53

RomanZavodskikh pushed a commit that referenced this pull request Jun 5, 2023

Make loadbalancer avoid sending requests to unhealthy nodes

56b49f0

Updates #2363 Signed-off-by: Roman Zavodskikh <roman.zavodskikh@zalando.de>

RomanZavodskikh force-pushed the admissionControlUnhealthyNodes branch 10 times, most recently from de7929c to 171360a Compare June 8, 2023 09:17

RomanZavodskikh commented Jun 8, 2023

View reviewed changes

RomanZavodskikh force-pushed the admissionControlUnhealthyNodes branch from 171360a to 21939fb Compare June 8, 2023 11:42

RomanZavodskikh commented Jun 8, 2023

View reviewed changes

szuecs reviewed Jun 8, 2023

View reviewed changes

lucastt reviewed Jun 8, 2023

View reviewed changes

RomanZavodskikh force-pushed the admissionControlUnhealthyNodes branch from 21939fb to 578275a Compare June 8, 2023 15:35

Roman Zavodskikh added 2 commits June 8, 2023 18:15

Refactror load balancer, decouple it from fadeIn

516aefa

Updates #2346 Signed-off-by: Roman Zavodskikh <roman.zavodskikh@zalando.de>

Make loadbalancer avoid sending requests to unhealthy nodes

f7b8ca6

Updates #2346 Signed-off-by: Roman Zavodskikh <roman.zavodskikh@zalando.de>

RomanZavodskikh force-pushed the admissionControlUnhealthyNodes branch from 578275a to f7b8ca6 Compare June 8, 2023 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make loadbalancer avoid sending requests to unhealthy nodes #2363

Make loadbalancer avoid sending requests to unhealthy nodes #2363

RomanZavodskikh commented Jun 1, 2023 •

edited

szuecs Jun 1, 2023 •

edited

lucastt Jun 2, 2023

szuecs Jun 1, 2023 •

edited

RomanZavodskikh Jun 2, 2023

lucastt Jun 2, 2023 •

edited

szuecs Jun 2, 2023

RomanZavodskikh Jun 6, 2023 •

edited

szuecs Jun 7, 2023

szuecs Jun 1, 2023

RomanZavodskikh Jun 2, 2023

szuecs Jun 2, 2023

lucastt Jun 2, 2023 •

edited

lucastt Jun 2, 2023 •

edited

lucastt Jun 2, 2023 •

edited

RomanZavodskikh Jun 2, 2023

szuecs Jun 2, 2023 •

edited

RomanZavodskikh Jun 8, 2023

RomanZavodskikh Jun 8, 2023

szuecs Jun 8, 2023 •

edited

szuecs Jun 8, 2023

szuecs Jun 8, 2023

lucastt Jun 8, 2023 •

edited

lucastt Jun 8, 2023

lucastt Jun 8, 2023

RomanZavodskikh Jun 8, 2023

lucastt Jun 8, 2023

RomanZavodskikh Jun 8, 2023

RomanZavodskikh Jun 8, 2023

lucastt Jun 8, 2023

RomanZavodskikh Jun 8, 2023 •

edited

RomanZavodskikh Jun 8, 2023

		}
		goodHealthyEp := totalReqs <= 10 \|\| float64(failedReqs)/float64(totalReqs) < 0.95 \|\| rnd.Float64() < 0.05

Make loadbalancer avoid sending requests to unhealthy nodes #2363

Are you sure you want to change the base?

Make loadbalancer avoid sending requests to unhealthy nodes #2363

Conversation

RomanZavodskikh commented Jun 1, 2023 • edited

szuecs Jun 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szuecs Jun 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucastt Jun 2, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RomanZavodskikh Jun 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucastt Jun 2, 2023 • edited

Choose a reason for hiding this comment

lucastt Jun 2, 2023 • edited

Choose a reason for hiding this comment

lucastt Jun 2, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szuecs Jun 2, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szuecs Jun 8, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucastt Jun 8, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RomanZavodskikh Jun 8, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RomanZavodskikh commented Jun 1, 2023 •

edited

szuecs Jun 1, 2023 •

edited

szuecs Jun 1, 2023 •

edited

lucastt Jun 2, 2023 •

edited

RomanZavodskikh Jun 6, 2023 •

edited

lucastt Jun 2, 2023 •

edited

lucastt Jun 2, 2023 •

edited

lucastt Jun 2, 2023 •

edited

szuecs Jun 2, 2023 •

edited

szuecs Jun 8, 2023 •

edited

lucastt Jun 8, 2023 •

edited

RomanZavodskikh Jun 8, 2023 •

edited