Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修改prometheus配置触发coordinator重建prometheus #61

Open
like-inspur opened this issue Jun 17, 2021 · 4 comments
Open

修改prometheus配置触发coordinator重建prometheus #61

like-inspur opened this issue Jun 17, 2021 · 4 comments

Comments

@like-inspur
Copy link
Contributor

我想测试修改prometheus配置后,coordinator能否感知到并同步配置到prometheus;修改配置后发现prometheus完全重建了(先删除再新建),cooridnator日志如下:
time="2021-06-16T00:37:20Z" level=warning msg="Statefulset prometheus UpdatedReplicas != Replicas, skipped" component="shard manager"
level=info ts=2021-06-16T00:37:27.302Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
W0616 00:37:27.302653 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
level=info ts=2021-06-16T00:37:27.304Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-06-16T00:37:27.306Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
time="2021-06-16T00:37:30Z" level=info msg="need space 57074" component=coordinator
time="2021-06-16T00:37:30Z" level=info msg="change scale to 1" component="shard manager" sts=prometheus
time="2021-06-16T00:37:40Z" level=error msg="get targets status info from prometheus-0 failed, url = http://100.101.245.226:8080: http get: Get "http://100.101.245.226:8080/api/v1/shard/targets/\": dial tcp 100.101.245.226:8080: connect: connection refused" component=coordinator
time="2021-06-16T00:37:40Z" level=error msg="get runtime info from prometheus-0 failed : http get: Get "http://100.101.245.226:8080/api/v1/shard/runtimeinfo/\": dial tcp 100.101.245.226:8080: connect: connection refused" component=coordinator
time="2021-06-16T00:37:40Z" level=info msg="need space 57074" component=coordinator
time="2021-06-16T00:37:40Z" level=warning msg="shard group prometheus-0 is unHealth, skip apply change" component=coordinator

@RayHuangCN
Copy link
Member

完全重建是啥意思?当前版本加入了一个机制,就是Prometheus在滚动更新的时候,协调工作会先暂停。重建pod是否是因为你修改了StatefulSet?

@like-inspur
Copy link
Contributor Author

没有,只是修改了prometheus的config,再次测试了依然复现,cooridnator日志也打印第二次分配shard的过程:

level=info ts=2021-06-24T11:10:55.967Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-06-24T11:10:55.969Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-06-24T11:10:55.971Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-06-24T11:10:55.973Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
time="2021-06-24T11:11:52Z" level=info msg="need space 65006" component=coordinator
time="2021-06-24T11:11:52Z" level=info msg="change scale to 2" component="shard manager" sts=prometheus
time="2021-06-24T11:12:52Z" level=info msg="need space 50907" component=coordinator
time="2021-06-24T11:12:52Z" level=info msg="prometheus-1 need update targets" component="shard manager" shard=prometheus-1 sts=prometheus
time="2021-06-24T11:12:52Z" level=info msg="change scale to 4" component="shard manager" sts=prometheus
time="2021-06-24T11:13:52Z" level=info msg="prometheus-3 need update targets" component="shard manager" shard=prometheus-3 sts=prometheus
time="2021-06-24T11:13:52Z" level=info msg="prometheus-2 need update targets" component="shard manager" shard=prometheus-2 sts=prometheus
time="2021-06-24T11:13:52Z" level=info msg="prometheus-1 need update targets" component="shard manager" shard=prometheus-1 sts=prometheus
level=info ts=2021-06-24T11:31:40.197Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-06-24T11:31:40.199Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-06-24T11:31:40.201Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
time="2021-06-24T11:31:53Z" level=info msg="need space 58203" component=coordinator
time="2021-06-24T11:31:53Z" level=info msg="change scale to 2" component="shard manager" sts=prometheus
time="2021-06-24T11:32:53Z" level=info msg="need space 58203" component=coordinator
time="2021-06-24T11:32:53Z" level=info msg="change scale to 4" component="shard manager" sts=prometheus
time="2021-06-24T11:33:54Z" level=info msg="need space 25247" component=coordinator
time="2021-06-24T11:33:54Z" level=info msg="prometheus-3 need update targets" component="shard manager" shard=prometheus-3 sts=prometheus
time="2021-06-24T11:33:54Z" level=info msg="change scale to 5" component="shard manager" sts=prometheus
time="2021-06-24T11:34:54Z" level=info msg="prometheus-4 need update targets" component="shard manager" shard=prometheus-4 sts=prometheus

@like-inspur
Copy link
Contributor Author

同时发现删除prometheus的某个job配置后,series总数减少了,但是新建的prometheus个数却比删除前还多一个(原来4个,现在5个),查看某些prometheus,根本没有target。应该是coordiantor分配target还有问题

@Michael754267513
Copy link

mark 我这也遇到了,只修改prometheus配置,prometheus的容器重建了,导致监控数据丢失

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants