-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: decrease healthcheck goroutine #2345
base: issue#2336
Are you sure you want to change the base?
Conversation
初始化协程池的逻辑我想的是放到 mosn 的启动逻辑里面去初始化,协程池的配置放到 mosn 的启动配置文件里面去 |
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, please continue
@@ -207,7 +207,7 @@ func (hc *healthChecker) startCheck(host types.Host) { | |||
log.DefaultLogger.Alertf("healthcheck.session", "[upstream] [health check] Create Health Check Session Error, Remote Address = %s", addr) | |||
return | |||
} | |||
c := newChecker(s, host, hc) | |||
c := newChecker(s, host, hc, GetWorkPool()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这没有解决我们的问题,实际上因为212-214依旧会为每个host创建一个生命周期与host不绑定的新的goroutine,所以goroutine泄漏或控制数量的问题依旧没有解决
mosn/pkg/upstream/healthcheck/healthchecker.go
Lines 202 to 220 in 31a55ee
func (hc *healthChecker) startCheck(host types.Host) { | |
addr := host.AddressString() | |
if _, ok := hc.checkers[addr]; !ok { | |
s := hc.sessionFactory.NewSession(hc.sessionConfig, host) | |
if s == nil { | |
log.DefaultLogger.Alertf("healthcheck.session", "[upstream] [health check] Create Health Check Session Error, Remote Address = %s", addr) | |
return | |
} | |
c := newChecker(s, host, hc, GetWorkPool()) | |
hc.checkers[addr] = c | |
utils.GoWithRecover(func() { | |
c.Start() | |
}, nil) | |
atomic.AddInt64(&hc.localProcessHealthy, 1) // default host is healthy | |
if log.DefaultLogger.GetLogLevel() >= log.INFO { | |
log.DefaultLogger.Infof("[upstream] [health check] create a health check session for %s", addr) | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为了和之前的功能保持一致,默认情况下就是每次就创建一个协程,
至于泄露问题,只要关闭 ssesionCheck 的时候,遗留的协程能释放就行,你是哪儿看出来现在还能泄露的呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
也是你说的那个场景,在2.3w个host时候依旧是有2.3w个goroutine,我们要控制goroutine总数的目标并没有达到,现在还额外的增加了复杂度
我不认为在Go中使用协程池是一个好主意,在我们的场景中实际上只需要解决两个问题
所以我们只需要一个生命周期与cluster相同的goroutine作为scheduler,在healthchecker时创建request级别的goroutine,通过一个concurrency配置控制并行度即可 |
假设不用池化协程的方式,还有其他更好的方式吗?
每次检查都创建一个新的协程,应该没有池化性能好,我觉得你这种方式更复杂,反而没池化简单, 当然我觉得很多人都用上池化的健康检查,因为他们的微服务规模都很小,所以默认情况下如果不配置协程池配置。那么就是不使用池化,和之前的功能尽可能保持一致,并简化代码的复杂度。 |
@doujiang24 @nejisama PTAL |
如上,这里协程总数并没有变化,由于go本身非阻塞io的特点,在goroutine内发起io请求是没有影响的,现在的实现依旧是在host维度创建了goroutine做scheduler,然后再提交到一个协程池发起实际请求,这就没有必要了。 当然,如果是在cluster维度配置协程池用来做并发控制是没有问题的,不过现在的协程池使用方式增加了实现的复杂度但是没有看到可解释的收益 |
此收益看得见的就是减小很内存,看不见的收益就是协程数减小,可以减少调度开销,当然需要在大规模的 host 的场景下,从性能角度来说,每个 cluster 就一个协程池,还不如全局公用一个协程池, 没太懂你协程数没有变化是什么意思,配置上这个配置后也没有变化? |
我的理解,这里主要原因是 我觉得有两个地方需要限制:
我的想法:
1 和 2,不管是那种方案,最终都是可能丢弃的,想要不丢任务,应该是很难的。需要做的是,丢任务的时候,打点日志 |
我觉得可以默认设置成异步提交,或者不允许配置成同步提交
ants 这个库可以配置队列大小(如果配置了队列好像就不能异步提交了,这个地方我再看看),也可以配置是否阻塞提交, |
针对你说的方案1: 配置里面有个可配置参数: 方案2:
|
|
PTAL @doujiang24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the worker queue is disabled by default, lgtm
Issues associated with this PR
#2336
初始化协程池的逻辑还没来得及写,等有空了再补充
Solutions
You should show your solutions about the issues in your PR, including the overall solutions, details and the changes. At this time, Chinese is allowed to describe these.
UT result
Unit Test is needed if the code is changed, your unit test should cover boundary cases, corner cases, and some exceptional cases. And you need to show the UT result.
Benchmark
If your code involves the processing of every request, you should give the Benchmark Result.
Code Style
Goimports
has runGolint
result