Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus挂掉后coordinator没有新建prometheus转移target #62

Open
like-inspur opened this issue Jun 25, 2021 · 6 comments
Open

Comments

@like-inspur
Copy link
Contributor

测试发现,如果现有prometheus挂掉,coordinator发现了但是没有新建prometheus转移target。coordiantor日志如下:

time="2021-06-25T00:20:40Z" level=warning msg="Statefulset prometheus UpdatedReplicas != Replicas, skipped" component="shard manager"
time="2021-06-25T00:21:40Z" level=warning msg="Statefulset prometheus is not ready, try wait 2m" component="shard manager"
time="2021-06-25T00:21:40Z" level=warning msg="Statefulset prometheus is not ready, still waiting" component="shard manager"
time="2021-06-25T00:22:40Z" level=warning msg="Statefulset prometheus is not ready, still waiting" component="shard manager"
time="2021-06-25T00:23:40Z" level=info msg="prometheus-1 is not ready" component=coordinator
time="2021-06-25T00:23:40Z" level=info msg="prometheus-0 is not ready" component=coordinator
time="2021-06-25T00:23:40Z" level=info msg="need space 58203" component=coordinator
time="2021-06-25T00:23:40Z" level=warning msg="shard group prometheus-0 is unHealth, skip apply change" component=coordinator
time="2021-06-25T00:23:40Z" level=warning msg="shard group prometheus-1 is unHealth, skip apply change" component=coordinator
@RayHuangCN
Copy link
Member

这个符合预期的,当前策略对于分片挂掉是不会转移的,会等待其恢复,否则可能出现以下情况。

  1. 分片挂掉(oom),Coordinator将该分片target转移至另外一个分片或重建一个分片
  2. 挂掉的分片又重启运行了一会,这时候Coordinator会发现有2个分片采集同一个target,所以会删除另外一个分片的target
  3. 分片又再次挂掉(oom),Coordinator将该分片target又转移至另外一个分片或重建一个分片,这样可能导致无限扩容。

@like-inspur
Copy link
Contributor Author

可以标记每个target所在的正确分片,即当出现2个分片采集同一个target时,删除多余分片的target,保证target在正确分片上

1 similar comment
@like-inspur
Copy link
Contributor Author

可以标记每个target所在的正确分片,即当出现2个分片采集同一个target时,删除多余分片的target,保证target在正确分片上

@RayHuangCN
Copy link
Member

当前已经可以转移

@RayHuangCN
Copy link
Member

如果分片恢复,会自动去掉一个采集

@like-inspur
Copy link
Contributor Author

如果分片恢复,会自动去掉一个采集

最近发一个新版本吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants