-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add max connection limit #50413
Add max connection limit #50413
Changes from 9 commits
d29c483
5b7ed40
80a93a9
f427d16
b05a224
662cdf4
7916bf9
1ad6c7f
65d8a67
5510ca0
ef7d754
9c845e3
8f28ccf
cbb2e79
feb25fb
c6c2495
5c33d91
569c9fa
2c70e51
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,11 +29,14 @@ import ( | |
"time" | ||
|
||
"github.com/fsnotify/fsnotify" | ||
middleware "github.com/grpc-ecosystem/go-grpc-middleware" | ||
grpcprom "github.com/grpc-ecosystem/go-grpc-prometheus" | ||
"golang.org/x/net/http2" | ||
"google.golang.org/grpc" | ||
"google.golang.org/grpc/codes" | ||
"google.golang.org/grpc/credentials" | ||
"google.golang.org/grpc/reflection" | ||
grpcstatus "google.golang.org/grpc/status" | ||
corev1 "k8s.io/api/core/v1" | ||
"k8s.io/client-go/rest" | ||
|
||
|
@@ -171,6 +174,8 @@ type Server struct { | |
statusManager *status.Manager | ||
// RWConfigStore is the configstore which allows updates, particularly for status. | ||
RWConfigStore model.ConfigStoreController | ||
|
||
connectionCounter atomic.Int64 | ||
} | ||
|
||
type readinessFlags struct { | ||
|
@@ -731,6 +736,7 @@ func (s *Server) initGrpcServer(options *istiokeepalive.Options) { | |
grpcprom.UnaryServerInterceptor, | ||
} | ||
grpcOptions := istiogrpc.ServerOptions(options, interceptors...) | ||
grpcOptions = append(grpcOptions, grpc.StreamInterceptor(middleware.ChainStreamServer(s.StreamServerOverloadInterceptor))) | ||
s.grpcServer = grpc.NewServer(grpcOptions...) | ||
s.XDSServer.Register(s.grpcServer) | ||
reflection.Register(s.grpcServer) | ||
|
@@ -780,6 +786,7 @@ func (s *Server) initSecureDiscoveryService(args *PilotArgs) error { | |
} | ||
opts := istiogrpc.ServerOptions(args.KeepaliveOptions, interceptors...) | ||
opts = append(opts, grpc.Creds(tlsCreds)) | ||
opts = append(opts, grpc.StreamInterceptor(middleware.ChainStreamServer(s.StreamServerOverloadInterceptor))) | ||
|
||
s.secureGrpcServer = grpc.NewServer(opts...) | ||
s.XDSServer.Register(s.secureGrpcServer) | ||
|
@@ -1348,3 +1355,16 @@ func (s *Server) initReadinessProbes() { | |
s.addReadinessProbe(name, probe) | ||
} | ||
} | ||
|
||
func (s *Server) StreamServerOverloadInterceptor(srv interface{}, ss grpc.ServerStream, info *grpc.StreamServerInfo, handler grpc.StreamHandler) error { | ||
current := s.connectionCounter.Add(1) | ||
connectionTotal.RecordInt(current) | ||
defer func() { | ||
connectionTotal.RecordInt(s.connectionCounter.Add(-1)) | ||
}() | ||
if features.ConnectionLimit > 0 && int64(features.ConnectionLimit) < current { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use that DiscoveryServer.adsClientCount There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need such precise limits? The grpc's dial time should be very short, and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. |
||
return grpcstatus.Errorf(codes.ResourceExhausted, "connection limit exceeded") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is incredibly unsafe. Unless you have a fixed number of istiod pods and clients, having hard limits around connections without having autoscaling tied to this limit is a recipe to trigger an outage where all connections are denied. It also makes a DOS attack trivial There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1. Curious how much memory you have allocated for Istiod and how many total clients you have + and how many clients connected to single Istiod caused Istiod to get OOMKilled? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In our test cluster, each instance is limited to 16c40g.(replicas 10+, fixed number) Each instance usually maintains more than 700 connections, and the memory usage is 22g-26g. When the cluster is thrashing (some failures), it is easy for a single instance to reach the memory limit. Our production cluster has many more clients. So I think: when the resources of an instance are fixed, the upper limit of clients it can serve is also fixed. |
||
} | ||
err := handler(srv, ss) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate with
istio/pilot/pkg/xds/monitoring.go
Line 75 in 3e427fc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think their meanings are different. xdsClients represents how many xds streams. connectionTotal represents the number of grpc sessions.
What do you think?