ElasticSearch 7.x too_many_buckets_exception #17327

bhozar · 2019-05-28T13:55:47Z

What happened:
Upgraded to ES 7.x and Grafana 6.2.x. Some panels relying on ES datasource was showing "Unknown elastic error response" in top left corner.

Query inspector displayed this error:

caused_by:Object
type:"too_many_buckets_exception"
reason:"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting."
max_buckets:10000

What you expected to happen:
Graph to display 3 hours of data from front end proxy logs stored in ElasticSearch 7.x.

How to reproduce it (as minimally and precisely as possible):
Query a lot of data

Environment:

Grafana version: 6.2.1
Data source type & version: ES 7.0
OS Grafana is installed on: Ubuntu 18.04
User OS & Browser: Win10/Chrome

marefr · 2019-05-28T14:20:26Z

As the error message from Elasticsearch says "This limit can be set by changing the [search.max_buckets] cluster level setting.". I don't see how Grafana can do something to resolve this.

To minimize these either add change min time interval on datasource or panel level or either add min doc count on date histogram to 1.

rnd-ash · 2019-05-29T16:00:32Z

Surely Grafana can do something here.

I've noticed that since Elasticsearch 7.x Elasticsearch now counts the terms aggregation towards bucket size, rather than just the date historgram. Kibana prevents this error by automatically widening the date histogram resolution when selecting a larger time interval. I found Kibana does this for the visual builder:

Panel time range -> Date historgram resolution
15 minutes -> 10 second
30 minutes -> 15 seconds
1 hour -> 30 seconds
4 hours -> 1 minute
12 hours -> 1 minute
24 hours -> 5 minutes
48 hours -> 10 minutes
7 days -> 1 hour

It appears although Grafana can automatically widen the date historgram time range, it is still making elasticsearch return too many buckets.

Maybe there could be a way for us to specify time resolutions based on our date pickers time range?

bhozar · 2019-06-05T09:14:24Z

I'm guessing I'm one of very few either experiencing this issue, or not many are running ES 7 yet.

Changing the min doc count to much higher has little effect, and changing the minimum time interval works fine if you are only looking at an hour of data, but as you expand the time range then fails. I also changed the ES setting to 100k, but Grafana is still requesting too fine a time grain.

If there was an option to set not only the minimum time value, but the full time range to histogram resolution it would probably work.

bh9 · 2019-06-05T13:13:37Z

Grafana should be using elasticsearch's scroll API (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html) for this. Increasing search.max_buckets above 10000 has no effect because elasticsearch hard-caps it at 10000

Ivan-Strahovsky · 2019-06-12T19:47:05Z

I'm surprised how this issue underrated, faced with same problem, changing interval in panel or data source helps. But usually we look at metrics daily and want to see it with small granularity, we also want to look at metrics weekly/monthly etc., to achieve this I have to change min interval in datasource/panel or have different dashboards with different interval set - this is not convenient.

marefr · 2019-06-13T09:24:40Z

Seems to be more and more hitting this problem so reopening the issue.

marefr · 2019-06-13T09:38:13Z

Not exactly sure though that it's as simple as extending the automatic intervals to solve this problem. As far as I understand this also depends on how many terms aggregations and buckets you get in total so not easy to solve in Grafana.

Some context to why they added the search.max_bucket setting: https://discuss.elastic.co/t/requesting-background-info-on-search-max-buckets-change/130334

To me it sounds like you still should be able to configure search.max_bucket to -1 in ES7 similar to how it per default behaved in ES6, but haven't had time to confirm this. Please try this out and let me know the result.

Looking at Kibana seems like they still have similar problems in at least some parts: elastic/kibana#36892

One of the commenters suggest

Run the aggregation via a composite aggregation in order to be able to paginate through results.

Kibana have this related issue open regarding composite aggregations: elastic/kibana#36358

I have never used composite aggregation and I currently know too little about it and why that would be better alternative than the regular aggregations. Also seems like composite aggregations is only supported from ES 6.1 and forward.

marefr · 2019-06-26T15:24:58Z

Just to verify, does changing the max concurrent shard request setting to 5 makes this better?

DenKn · 2019-07-02T16:43:43Z

It seems I have the same issue "Unknown elastic error response".
I have events from 27.05 until now in ES.
And if I set Quick Range Last 90 days (in grafana) I get error but If I set Last 30 days or Last 6 months there is no an error.

lstyles · 2019-07-03T10:26:08Z

@marefr -1 isn't a valid option for search.max_buckets setting. It returns an error that says it needs value >= 0.

Setting it to 0 is possible but than nothing seems to be getting returned.

cpmoore · 2019-07-13T01:00:25Z

I'm facing the same issue with a count panel grouped by two terms and Date Histogram
It works fine up to the last 5 hours, when I attempt to view the last 6 hours it give the issue regarding the 10000 buckets.
I attempted to change my search.max_buckets setting on my cluster to 15000, but then the error said
Must be less than or equal to: [15000] but was [15001], still 1 more than my cluster setting.

Setting the max concurrent shard to 5 did not help.

It does appear that setting a higher min time interval allows the graph to work, but it also groups more points together and reduces the precision of the data. I have the default set as 30s changing to 60s lets the last 6 hours work.

RedStalker · 2019-07-17T15:07:15Z

Hello everyone. Also faced with such problem after we start the migration to 7.1 version of ELK.
Increasing the search.max_buckets value doesn't help much - it will always result in an error that it's over the limit.

Akaoni · 2019-08-13T03:13:39Z

+1.

torkelo · 2019-08-13T10:41:42Z

Kibana prevents this error by automatically widening the date histogram resolution when selecting a larger time interval. I found Kibana does this for the visual builder:

Grafana does the exact same thing if you set the date histogram interval to auto.

Cylox · 2019-08-14T06:43:27Z

Having the same problem. Setting the date histogram interval to auto does not help. Cannot create a histogram that aggregates data from the last few days while before the update it was basically possible to view arbitrary time ranges. Interestingly enough a table panel with the exactly same data source does work.

M0rdecay · 2019-08-21T06:15:06Z

Having the same problem too.
Grafana - 6.2.5
ES - 7.3.0

WeilunZ · 2019-09-09T03:40:07Z

Increasing the Min time interval works but when you increase your time range, you must change the Min time interval value again.

fjlour · 2019-09-09T10:00:35Z

Also reporting the same problem as described here. Expanding the timerange on a high time resolution dataset (fine grain) will cause this error. Perhaps Grafana should adapt the group by time to a wider window as the user expands the time range, in order to bring the data more aggregated.

If my data has a min resolution of milisseconds, there's no need to bring millions of documents to be displayed in a 3 month chart. Data should be aggregated at ES at query level.

Theoooooo · 2019-10-02T08:22:04Z

I'm also seing this issue in the explore panel for the version 6.4.1 of Grafana.
When choosing greater time range (more than 1 hour) i got an Unknown Elastic Error Response because the query return too many buckets to aggregate or display. But that occur only in the "Logs" tab and not the "Metrics" tab.

There is also something to do there to help display the informations without having to modify options into elastic

UkrZilla · 2019-10-07T08:49:12Z

Having the same problem too.
Grafana - 6.4.1
ES - 7.4.0

Also think Grafana should use scroll API

cpmoore · 2019-10-09T10:00:03Z

It may be possible to wrap the date histogram aggregation in a composite aggregation then paginate between the results and combine them client side.

flunda · 2019-10-10T06:38:44Z

Same problem here.
ES - 7.2.0
Grafana - 6.4.2

unglaublicherdude · 2019-10-10T09:24:18Z

Same problem here:
ES - 7.3.2
Grafana - 6.4.2

CRad14 · 2019-10-11T14:30:58Z

Same Problem here
ES 7.1
Grafana 6.4.2

ywsong219 · 2019-10-22T13:28:44Z

Same issue..
ES 7.3.2
Grafana 6.4.0

marefr · 2019-12-30T09:58:59Z

@redNixon that's definitely a bug. Thanks for reporting.

berglh · 2020-03-25T05:45:34Z

Surely Grafana can do something here.

I've noticed that since Elasticsearch 7.x Elasticsearch now counts the terms aggregation towards bucket size, rather than just the date historgram. Kibana prevents this error by automatically widening the date histogram resolution when selecting a larger time interval. I found Kibana does this for the visual builder:

Panel time range -> Date historgram resolution
15 minutes -> 10 second
30 minutes -> 15 seconds
1 hour -> 30 seconds
4 hours -> 1 minute
12 hours -> 1 minute
24 hours -> 5 minutes
48 hours -> 10 minutes
7 days -> 1 hour

It appears although Grafana can automatically widen the date historgram time range, it is still making elasticsearch return too many buckets.

Maybe there could be a way for us to specify time resolutions based on our date pickers time range?

Elasticsearch will return whatever you ask it to. The interval is a client side parameter - the problem with the implementation in Grafana is that it uses the "Date Histogram" aggregation and then doesn't scale the "interval" parameter. The whole concept of Auto isn't supported by the Date Histogram aggregation method in the Search API at all and is misleading when this option is presented in Grafana. I believe this confusion occurs because when people choose Auto in Kibana, it dynamically changes the interval size, but perhaps not how you you might think.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-aggregations-bucket-datehistogram-aggregation.html

Looking at the API documentation, the interval is supplied by the client at query. There is no feature in this part of the search API to "Auto" time interval.

The problem when looking at large time series is that even though you may have < 10000 buckets, those buckets have many large shards or you are performing Term sub-aggregations along with the Date Histogram which adds more total buckets (sub queries) to the parent aggregation. That for me results in Java OOM errors in Elasticsearch. If your query generated more than 10000 buckets, you will hit the too many buckets exception as in the OP. As people have mentioned, if you manually set the Min Time Interval, you basically increase the stability of the query by reducing the total aggregation buckets. While this might work in some limited situations, it is always a trade off when zooming in to small time periods (Very large time buckets reduce resolution of visualisation) or zooming out to larger time frames (OOM/Too Many Buckets)

While a solution could be coded into Grafana to scale the time interval to something they feel is sensible per quick time interval pick, the obvious solution for this in my humble opinion is to expose the Auto Date Histogram aggregation method in the Elasticsearch Search API to the Group By section in Grafana. This will allow the user to define the max number of time buckets a given visualisation should return, similar to the auto time interval in Kibana. You can check out the examples here.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-aggregations-bucket-autodatehistogram-aggregation.html

The user is then in control of selecting the maximum time buckets per query which allows the user to control how heavy/detailed each query is and then have Elasticsearch scale the buckets over larger time frames. I think this would be a killer feature for the Elasticsearch data source in Grafana and provide a similar experience to the default Date Aggregations in Kibana :)

esseti · 2020-04-17T09:42:20Z

jumping on the discussion, is there a way to change the interval value in a dashboard in an easy way? the auto method does not work for me, so i would be happy to ahve a single point where to change it.

Augustin-FL · 2020-04-18T16:45:37Z

Hi all,

For those who want to get rid of search.max_buckets,

Setting the value doesn't work (You get the error Failed to parse value [-1] for setting [search.max_buckets] must be >= 0 when you try)
However, you can set it to the maximum accepted value (2^31-1):

PUT _cluster/settings
{
  "persistent":{
  "search.max_buckets":"2147483647"
  }
}

That effectively disable the setting.

For information, this setting is currently being deprecated (see elastic/elasticsearch#51731 )

JonasDeGendt · 2020-04-20T09:21:05Z

Hi all,

For those who want to get rid of search.max_buckets,
* Setting the value doesn't work (You get the error `Failed to parse value [-1] for setting [search.max_buckets] must be >= 0` when you try)

* However, you can set it to the maximum accepted value (2^31-1):
PUT _cluster/settings
{
  "persistent":{
  "search.max_buckets":"2147483647"
  }
}
That effectively disable the setting.

For information, this setting is currently being deprecated (see elastic/elasticsearch#51731 )

This works flawlessly, thanks a lot!

UkrZilla · 2020-04-21T07:02:49Z

Hi guys,

Have a good news for you:
elastic/elasticsearch#46751

According to elastic/elasticsearch#55266

We introduced a new search.check_buckets_step_size setting to
better control how the coordinating node allocates memory when aggregating
buckets. The allocation of buckets is now be done in steps, each step
allocating a number of buckets equal to this setting. To avoid an OutOfMemory
error, a parent circuit breaker check is performed on allocation.

s1sfa · 2020-04-21T15:04:59Z

I think it would be ideal if grafana handled time interval dynamically based on time range like kibana. If you want per second values over multiple days. It doesn't make computational sense to request every 1 second of multiple days from elasticsearch.

frittentheke · 2020-04-21T21:46:31Z

I think it would be ideal if grafana handled time interval dynamically based on time range like kibana. If you want per second values over multiple days. It doesn't make computational sense to request every 1 second of multiple days from elasticsearch.

Exactly that (as I also suggested in my comment above if I may say so :-) ).

berglh · 2020-04-21T22:31:07Z

@frittentheke @s1sfa I think Grafana shouldn't be responsible for managing the scaling when this feature is available in the Elasticsearch Search API already. Just need to add the Autodate Histogram Aggregation instead of the regular Date Histogram Aggregation to the Elasticsearch data source in Grafana, then Elasticsearch will scale the buckets accordingly to the time range query requested.

frittentheke · 2020-04-22T21:53:53Z

@berglh while the new functionality migh be helpful, very helpful even it's not as simple as to just "use the right query or function". The Auto-interval Date Histogram Aggregation will comfortably create buckets for a certain interval sensible to greate the graph. But even if using this there could be cases (querying i.e. for counts of individual termins) in which Grafana still needs to deal with the case that the selected / requested time interval cannot be queried without causing too many buckets to be created. But certainly it's best to use as much of the storage backend functionality to optimize the querying server-side - sorry for not properly diving into the discussion with my last post.

s1sfa · 2020-04-22T22:17:44Z

@berglh - I like the idea but I think it would need a bit of testing on grafana elastic query building. I tried to simply swap out date_histogram for auto_date_histogram and it appears to not work with other aggregations, like sum. {'reason': 'The first aggregation in buckets_path must be a multi-bucket aggregation. Secondly. With this elastic way or not, the ability to scale to a specified interval is pretty important. If I want a per second rate or something, auto date doesn't have any parameters to know, but it does return the interval it used. Which would be pretty similar to grafana just changing the interval on a regular query then doing the division to get the intended values.

The max_bucket being at a low threshold is mostly an elasticsearch problem which it looks like they are improving in new versions. But if we think about trying to get one month worth of per second data on a graph, some sort of auto scaling needs to exist. Whether grafana is making the decision based on some source parameters or elastic search autodate is figured out and grafana has the ability to do a calculation on the interval to have values in the desired interval, like per second.

berglh · 2020-04-22T23:20:52Z

@s1sfa Thanks for trying it out :) Just to clarify my position if I wasn't clear, I'm not suggesting we swap it out directly. I think both types of aggregations are useful depending on the case of the visualisation. I just think providing it as an option as a query type for Elasticsearch data source would be useful for people wanting a more Kibana like experience when creating a dashboard in Grafana.

The max_bucket being at a low threshold is mostly an elasticsearch problem which it looks like they are improving in new versions.

I read the recent improvement in this area that they are looking to handle the circuit breaking of long running queries more reliably to prevent Out of Memory errors. The performance of Elasticsearch has always been improving, increasing stability under larger queries over time - so you are probably right.

Still, I would doubt that there is never a condition where a query will hit a circuit breaker and return a different error like "unable to service the query due to exceeding circuit breaker" due to effectively the cluster determining too many buckets being the cause of the issue. These types of problems will probably occur less with solid state storage, on spinning disk clusters though with datasets many times larger than the combined JVM heap of the cluster, or histograms split by a big number of term sub aggregations will always run into issues with buckets one way or another.

@frittentheke Giving the user the ability to set a specific integer for the "buckets" parameter of the auto-date histogram query method would let the user tune the graph to the performance characteristics of the dataset and hardware performance. There is nothing stopping a user requesting the last 10 years of data and the query still timing out or hitting some other Elasticsearch performance issue - I figure there is only so much hand holding Grafana can do. I still think there is benefit at least we can give the user an option for an auto-interval scaling solution.

It's up to the Grafana community and data source maintainers to decide if an auto-interval scaling solution should be handled by Grafana, and if there are any trade-offs with metrics style aggregations as you pointed out @s1sfa - I don't have enough experience with this query to say whether it's even worth implementing, I just read the manual and was voicing an opinion based on that limited information - it reads like an easy win to give the user control over to auto-interval to a sensible bucket limit on a case by case basis 😳

(grafana/grafana#17327)

berglh · 2020-09-25T01:52:58Z

I believe this issue is actually closed by this commit: #21937. You can now set the maximum data points per visualisation which then automatically calculates the time interval of the aggregation buckets. Between setting your maximum sub aggregation size limits and the max data points, you get a nicely scaling solutions with the aggregation filter. 🎉 I am running Grafana latest from Docker Hub v7.2.0 (efe4941).

Elfo404 · 2020-10-05T12:23:21Z

@berglh Thanks for bringing this up, and you are right, this should be fixed in the 6.6.2 release with #21937.
I'm closing this issue, if someone is still facing this problem we can reopen it 🙂

eertul · 2023-11-24T13:47:06Z

Hello, We have Elasticsearch 7.15 and Grafana 9.4.7. We still face this problem.

Elfo404 · 2023-11-24T14:29:42Z

9.4.x is way past EOL and not supported anymore. does this still happen with a more recent (supported) version?

marefr closed this as completed May 28, 2019

marefr reopened this Jun 13, 2019

marefr added datasource/Elasticsearch type/feature-request labels Jun 13, 2019

torkelo mentioned this issue Oct 16, 2019

Grafana - Elasticsearch Count metric with second interval leading to error #19863

Closed

marefr mentioned this issue Dec 30, 2019

Elasticsearch: Wrong auto interval for date histogram in explore logs mode #21285

Closed

marefr mentioned this issue Feb 4, 2020

Grafana integration with Elasticsearch for APM metrics does not group by terms with keywords #21875

Closed

ivanahuckova mentioned this issue Apr 23, 2020

Explore: changing date interval results in "Unkown elastic error response" #22377

Closed

dprokop mentioned this issue Jul 7, 2020

Bug - Alerting with Elasticsearch datasource ignores min interval #26076

Closed

aocenas added needs investigation for unconfirmed bugs. use type/bug for confirmed bugs, even if they "need" more investigating prio/high Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 26, 2020

aocenas added this to Backlog features in Observability (deprecated, use Observability Squad) Aug 26, 2020

narkisr pushed a commit to re-ops/re-dock that referenced this issue Sep 20, 2020

Fixing Grafana issue when viewing large time ranges

65e217e

(grafana/grafana#17327)

Elfo404 self-assigned this Sep 28, 2020

Elfo404 closed this as completed Oct 5, 2020

Observability (deprecated, use Observability Squad) automation moved this from Backlog features to Done Oct 5, 2020

zoltanbedi removed the needs investigation for unconfirmed bugs. use type/bug for confirmed bugs, even if they "need" more investigating label Nov 10, 2020

veluxer62 mentioned this issue Apr 21, 2021

4월 개발일지 veluxer62/veluxer62.github.io#530

Closed

felipeelia mentioned this issue May 6, 2021

503 Service Unavailable on homepage 10up/ElasticPress#2198

Closed

ElasticSearch 7.x too_many_buckets_exception #17327

ElasticSearch 7.x too_many_buckets_exception #17327

Comments

bhozar commented May 28, 2019

marefr commented May 28, 2019

rnd-ash commented May 29, 2019 • edited

bhozar commented Jun 5, 2019

bh9 commented Jun 5, 2019

Ivan-Strahovsky commented Jun 12, 2019

marefr commented Jun 13, 2019

marefr commented Jun 13, 2019

marefr commented Jun 26, 2019

DenKn commented Jul 2, 2019 • edited

lstyles commented Jul 3, 2019 • edited

cpmoore commented Jul 13, 2019 • edited

RedStalker commented Jul 17, 2019

Akaoni commented Aug 13, 2019

torkelo commented Aug 13, 2019

Cylox commented Aug 14, 2019

M0rdecay commented Aug 21, 2019

WeilunZ commented Sep 9, 2019

fjlour commented Sep 9, 2019

Theoooooo commented Oct 2, 2019 • edited

UkrZilla commented Oct 7, 2019

cpmoore commented Oct 9, 2019 • edited

flunda commented Oct 10, 2019

unglaublicherdude commented Oct 10, 2019 • edited

CRad14 commented Oct 11, 2019

ywsong219 commented Oct 22, 2019

marefr commented Dec 30, 2019

berglh commented Mar 25, 2020 • edited

esseti commented Apr 17, 2020

Augustin-FL commented Apr 18, 2020 • edited

JonasDeGendt commented Apr 20, 2020

UkrZilla commented Apr 21, 2020

s1sfa commented Apr 21, 2020

frittentheke commented Apr 21, 2020

berglh commented Apr 21, 2020

frittentheke commented Apr 22, 2020 • edited

s1sfa commented Apr 22, 2020

berglh commented Apr 22, 2020 • edited

berglh commented Sep 25, 2020

Elfo404 commented Oct 5, 2020

eertul commented Nov 24, 2023 • edited

Elfo404 commented Nov 24, 2023

rnd-ash commented May 29, 2019 •

edited

DenKn commented Jul 2, 2019 •

edited

lstyles commented Jul 3, 2019 •

edited

cpmoore commented Jul 13, 2019 •

edited

Theoooooo commented Oct 2, 2019 •

edited

cpmoore commented Oct 9, 2019 •

edited

unglaublicherdude commented Oct 10, 2019 •

edited

berglh commented Mar 25, 2020 •

edited

Augustin-FL commented Apr 18, 2020 •

edited

frittentheke commented Apr 22, 2020 •

edited

berglh commented Apr 22, 2020 •

edited

eertul commented Nov 24, 2023 •

edited