You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanos, Prometheus and Golang version used:
Thanos 0.34.0
Prometheus 2.51.0
Object Storage Provider: Linode
What happened:
I have Grafana deployed to my k8s cluster as part of the kube-prometheus-stack Helm chart.
It is connected to my Thanos querier as its main datasource (which is connected to various Thanos sidecars).
One of our performance engineers has raised my attention to an issue in Grafana specifically, with the following query (note this is using custom metrics from our apps):
sum (irate(starlord_http_requests_total{container=“starlord-cyber-feed”,namespace=“app”, cluster=“qa-1”}[1m])) by (cluster)
The problem is:
On local prometheus UI, or on Thanos querier UI, running this query works, no problems at all.
But on Grafana (as part of a dashboard, or generally on explore), as soon as we increase the time to >12h, the graph flattens down to 0…
Now, since the query is working just fine on both Prometheus and Thanos Querier, I am left to believe the issue here must be with Grafana
(as Thanos Querier is its datasource, so why would it provide a different response?)
Some example screenshots:
Here is the query set to 1 hour, in both Grafana and Thanos querier, looks all good:
Now, here it is in both, set to 24 hours:
I’ve tried debugging this and haven’t found much, what I did try:
Tried using “rate” instead of “irate”, same issue
Tried changing the datasource’s “scrape interval” to 30s (from the default 15), same issue
Tried updating Promteheus+Grafana+Thanos to latest version
Only lead I did find is this log line, with http status 400, matching my query:
logger=context userId=3 orgId=1 uname=<my-email> t=2024-04-01T16:07:47.268619112Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=10.2.1.129 time_ms=16 duration=16.278802ms size=13513 referer="https://<my-domain>/explore?orgId=1&panes=%7B%22r5m%22%3A%7B%22datasource%22%3A%22P5DCFC7561CCDE821%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22sum+%28rate%28starlord_http_requests_total%7Bcontainer%3D%5C%22starlord-cyber-feed%5C%22%2Cnamespace%3D%5C%22app%5C%22%2C+cluster%3D%5C%22qa-1%5C%22%7D%5B1m%5D%29%29+by+%28cluster%29%22%2C%22range%22%3Atrue%2C%22instant%22%3Atrue%2C%22datasource%22%3A%7B%22type%22%3A%22prometheus%22%2C%22uid%22%3A%22P5DCFC7561CCDE821%22%7D%2C%22editorMode%22%3A%22code%22%2C%22legendFormat%22%3A%22__auto%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%22now-24h%22%2C%22to%22%3A%22now%22%7D%7D%7D&schemaVersion=1" handler=/api/ds/query status_source=downstream
So, possibly Thanos querier is failing to handle the query from Grafana for some reason? Thanos itself isn't showing anything on the log about this...
What you expected to happen:
I expected queries ran in Thanos querier, and in Grafana (which queries Thanos querier) to be the same.
How to reproduce it (as minimally and precisely as possible):
Not sure how to reproduce without our specific metrics, but general setup is kube-prometheus-stack + thanos-querier + thanos sidecar(s)
The text was updated successfully, but these errors were encountered:
Thanos, Prometheus and Golang version used:
Thanos 0.34.0
Prometheus 2.51.0
Object Storage Provider: Linode
What happened:
I have Grafana deployed to my k8s cluster as part of the kube-prometheus-stack Helm chart.
It is connected to my Thanos querier as its main datasource (which is connected to various Thanos sidecars).
One of our performance engineers has raised my attention to an issue in Grafana specifically, with the following query (note this is using custom metrics from our apps):
sum (irate(starlord_http_requests_total{container=“starlord-cyber-feed”,namespace=“app”, cluster=“qa-1”}[1m])) by (cluster)
The problem is:
On local prometheus UI, or on Thanos querier UI, running this query works, no problems at all.
But on Grafana (as part of a dashboard, or generally on explore), as soon as we increase the time to >12h, the graph flattens down to 0…
Now, since the query is working just fine on both Prometheus and Thanos Querier, I am left to believe the issue here must be with Grafana
(as Thanos Querier is its datasource, so why would it provide a different response?)
Some example screenshots:
Here is the query set to 1 hour, in both Grafana and Thanos querier, looks all good:
Now, here it is in both, set to 24 hours:
I’ve tried debugging this and haven’t found much, what I did try:
Only lead I did find is this log line, with http status 400, matching my query:
logger=context userId=3 orgId=1 uname=<my-email> t=2024-04-01T16:07:47.268619112Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=10.2.1.129 time_ms=16 duration=16.278802ms size=13513 referer="https://<my-domain>/explore?orgId=1&panes=%7B%22r5m%22%3A%7B%22datasource%22%3A%22P5DCFC7561CCDE821%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22sum+%28rate%28starlord_http_requests_total%7Bcontainer%3D%5C%22starlord-cyber-feed%5C%22%2Cnamespace%3D%5C%22app%5C%22%2C+cluster%3D%5C%22qa-1%5C%22%7D%5B1m%5D%29%29+by+%28cluster%29%22%2C%22range%22%3Atrue%2C%22instant%22%3Atrue%2C%22datasource%22%3A%7B%22type%22%3A%22prometheus%22%2C%22uid%22%3A%22P5DCFC7561CCDE821%22%7D%2C%22editorMode%22%3A%22code%22%2C%22legendFormat%22%3A%22__auto%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%22now-24h%22%2C%22to%22%3A%22now%22%7D%7D%7D&schemaVersion=1" handler=/api/ds/query status_source=downstream
So, possibly Thanos querier is failing to handle the query from Grafana for some reason? Thanos itself isn't showing anything on the log about this...
What you expected to happen:
I expected queries ran in Thanos querier, and in Grafana (which queries Thanos querier) to be the same.
How to reproduce it (as minimally and precisely as possible):
Not sure how to reproduce without our specific metrics, but general setup is kube-prometheus-stack + thanos-querier + thanos sidecar(s)
The text was updated successfully, but these errors were encountered: