Add max connection limit #50413

daimaxiaxie · 2024-04-12T04:10:51Z

Please provide a description of this PR:

Control Plane load imbalance, such as instance failure, network disconnection. Both may cause a single instance burst and overload. Final avalanche(All instances start and then crash). #50412

Therefore there is an upper limit on connections for a single control plane instance to prevent overload.

env PILOT_MAX_CONNECTION = 0, Limits the number of incoming ADS connection. If set to 0 or unset, disabled the feature.

…dd-max-connection-limit

istio-policy-bot · 2024-04-12T04:10:56Z

😊 Welcome @daimaxiaxie! This is either your first contribution to the Istio istio repo, or it's been
a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

istio-testing · 2024-04-12T04:11:02Z

Hi @daimaxiaxie. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

zirain · 2024-04-12T04:29:19Z

/ok-to-test

wulianglongrd · 2024-04-12T04:59:26Z

I don't think there is much value in limiting the total number of connections. If there are too many connections, it will only affect the speed of configuration push.

If you just want to prevent the impact of a sudden increase in traffic, you can use PILOT_MAX_REQUESTS_PER_SECOND to limit the request.

wulianglongrd · 2024-04-12T05:02:55Z

Also, can we use DiscoveryServer.adsClients to count the number of connections?

wulianglongrd · 2024-04-12T05:11:08Z

pilot/pkg/bootstrap/monitoring.go

@@ -49,10 +49,17 @@ var (
 		"pilot_info",
 		"Pilot version and build information.",
 	)
+
+	connectionTotal = monitoring.NewGauge(


duplicate with

istio/pilot/pkg/xds/monitoring.go

Line 75 in 3e427fc

xdsClients = monitoring.NewGauge(

I think their meanings are different. xdsClients represents how many xds streams. connectionTotal represents the number of grpc sessions.

What do you think?

hzxuzhonghu · 2024-04-12T07:46:10Z

It makes some sense, the imbalanced connections could cause one pilot overloaded, costing too much more mem/cpu than the others.

hzxuzhonghu · 2024-04-12T07:51:42Z

pilot/pkg/bootstrap/server.go

+	defer func() {
+		connectionTotal.RecordInt(s.connectionCounter.Add(-1))
+	}()
+	if features.ConnectionLimit > 0 && int64(features.ConnectionLimit) < current {


use that DiscoveryServer.adsClientCount

DiscoveryServer.adsClients change in stream handler(initConnection). It is not synchronized with grpc's dial. I think this will cause some problems.

Do we need such precise limits? The grpc's dial time should be very short, and PILOT_MAX_REQUESTS_PER_SECOND can limit burst connections. This PR is just to limit the continued increase in connections, right?

This PR is just to limit the continued increase in connections, right?

Yes.
There is a gap between initConnection and grpc.dial. If use DiscoveryServer.adsClients, ,this feature will invalid in extreme cases. I feel it's better to be more precise. What do you think about that? Thanks.

daimaxiaxie · 2024-04-12T08:08:29Z

I don't think there is much value in limiting the total number of connections. If there are too many connections, it will only affect the speed of configuration push.

Too many connections lead to a heavy memory load, even OOM. @wulianglongrd

And this sudden increase in traffic is different. It sets a overall service cap for a control plane instance.

wulianglongrd · 2024-04-12T08:15:43Z

Makes sense. Addendum: There is also an existing counter here, which is used by the pilot_xds indicator.

istio/pilot/pkg/xds/monitoring.go

Line 80 in 3e427fc

xdsClientTracker = make(map[string]float64)

daimaxiaxie · 2024-04-12T08:16:14Z

Also, can we use DiscoveryServer.adsClients to count the number of connections?

No, DiscoveryServer.adsClients change in stream handler. It is not synchronized with grpc's dial time.

howardjohn · 2024-04-12T14:15:38Z

pilot/pkg/bootstrap/server.go

+		connectionTotal.RecordInt(s.connectionCounter.Add(-1))
+	}()
+	if features.ConnectionLimit > 0 && int64(features.ConnectionLimit) < current {
+		return grpcstatus.Errorf(codes.ResourceExhausted, "connection limit exceeded")


This is incredibly unsafe. Unless you have a fixed number of istiod pods and clients, having hard limits around connections without having autoscaling tied to this limit is a recipe to trigger an outage where all connections are denied. It also makes a DOS attack trivial

+1. Curious how much memory you have allocated for Istiod and how many total clients you have + and how many clients connected to single Istiod caused Istiod to get OOMKilled?

In our test cluster, each instance is limited to 16c40g.（replicas 10+, fixed number） Each instance usually maintains more than 700 connections, and the memory usage is 22g-26g. When the cluster is thrashing (some failures), it is easy for a single instance to reach the memory limit. Our production cluster has many more clients.

So I think: when the resources of an instance are fixed, the upper limit of clients it can serve is also fixed.
This feature(pr) is disabled by default

istio-policy-bot · 2024-05-13T05:04:58Z

🧭 This issue or pull request has been automatically marked as stale because it has not had activity from an Istio team member since 2024-04-12. It will be closed on 2024-05-27 unless an Istio team member takes action. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

daimaxiaxie and others added 7 commits February 18, 2024 16:01

add max connection limit

d29c483

Merge branch 'istio:master' into add-max-connection-limit

5b7ed40

update conection limit interceptor

80a93a9

add connection limit unit test

f427d16

Merge remote-tracking branch 'origin/add-max-connection-limit' into a…

b05a224

…dd-max-connection-limit

Merge branch 'istio:master' into add-max-connection-limit

662cdf4

format import

7916bf9

daimaxiaxie requested a review from a team as a code owner April 12, 2024 04:10

istio-policy-bot added the area/user experience label Apr 12, 2024

istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 12, 2024

istio-testing added the needs-ok-to-test label Apr 12, 2024

daimaxiaxie added 2 commits April 12, 2024 12:24

update import

1ad6c7f

update import order

65d8a67

istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Apr 12, 2024

wulianglongrd reviewed Apr 12, 2024

View reviewed changes

hzxuzhonghu reviewed Apr 12, 2024

View reviewed changes

daimaxiaxie added 4 commits April 12, 2024 16:52

add release-note

5510ca0

git apply patch

ef7d754

fix gocritic of append opts

9c845e3

rename nextId to nextID

8f28ccf

daimaxiaxie added 6 commits April 12, 2024 18:06

fix TestMaxConnection race

cbb2e79

add no limit case in TestMaxConnection

feb25fb

fix release-notes

c6c2495

fix G404

5c33d91

fix release-notes

569c9fa

clean unless code in release-notes

2c70e51

howardjohn requested changes Apr 12, 2024

View reviewed changes

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add max connection limit #50413

Add max connection limit #50413

daimaxiaxie commented Apr 12, 2024 •

edited

istio-policy-bot commented Apr 12, 2024

istio-testing commented Apr 12, 2024

zirain commented Apr 12, 2024

wulianglongrd commented Apr 12, 2024

wulianglongrd commented Apr 12, 2024

wulianglongrd Apr 12, 2024

daimaxiaxie Apr 12, 2024

hzxuzhonghu commented Apr 12, 2024

hzxuzhonghu Apr 12, 2024

daimaxiaxie Apr 12, 2024 •

edited

wulianglongrd Apr 12, 2024 •

edited

daimaxiaxie Apr 12, 2024 •

edited

daimaxiaxie commented Apr 12, 2024

wulianglongrd commented Apr 12, 2024

daimaxiaxie commented Apr 12, 2024

howardjohn Apr 12, 2024

ramaraochavali Apr 12, 2024

daimaxiaxie Apr 15, 2024 •

edited

istio-policy-bot commented May 13, 2024

Add max connection limit #50413

Are you sure you want to change the base?

Add max connection limit #50413

Conversation

daimaxiaxie commented Apr 12, 2024 • edited

istio-policy-bot commented Apr 12, 2024

istio-testing commented Apr 12, 2024

zirain commented Apr 12, 2024

wulianglongrd commented Apr 12, 2024

wulianglongrd commented Apr 12, 2024

wulianglongrd Apr 12, 2024

Choose a reason for hiding this comment

daimaxiaxie Apr 12, 2024

Choose a reason for hiding this comment

hzxuzhonghu commented Apr 12, 2024

hzxuzhonghu Apr 12, 2024

Choose a reason for hiding this comment

daimaxiaxie Apr 12, 2024 • edited

Choose a reason for hiding this comment

wulianglongrd Apr 12, 2024 • edited

Choose a reason for hiding this comment

daimaxiaxie Apr 12, 2024 • edited

Choose a reason for hiding this comment

daimaxiaxie commented Apr 12, 2024

wulianglongrd commented Apr 12, 2024

daimaxiaxie commented Apr 12, 2024

howardjohn Apr 12, 2024

Choose a reason for hiding this comment

ramaraochavali Apr 12, 2024

Choose a reason for hiding this comment

daimaxiaxie Apr 15, 2024 • edited

Choose a reason for hiding this comment

istio-policy-bot commented May 13, 2024

daimaxiaxie commented Apr 12, 2024 •

edited

daimaxiaxie Apr 12, 2024 •

edited

wulianglongrd Apr 12, 2024 •

edited

daimaxiaxie Apr 12, 2024 •

edited

daimaxiaxie Apr 15, 2024 •

edited