-
Notifications
You must be signed in to change notification settings - Fork 11.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElasticSearch 7.x too_many_buckets_exception #17327
Comments
As the error message from Elasticsearch says "This limit can be set by changing the [search.max_buckets] cluster level setting.". I don't see how Grafana can do something to resolve this. To minimize these either add change min time interval on datasource or panel level or either add min doc count on date histogram to 1. |
Surely Grafana can do something here. I've noticed that since Elasticsearch 7.x Elasticsearch now counts the terms aggregation towards bucket size, rather than just the date historgram. Kibana prevents this error by automatically widening the date histogram resolution when selecting a larger time interval. I found Kibana does this for the visual builder: Panel time range -> Date historgram resolution It appears although Grafana can automatically widen the date historgram time range, it is still making elasticsearch return too many buckets. Maybe there could be a way for us to specify time resolutions based on our date pickers time range? |
I'm guessing I'm one of very few either experiencing this issue, or not many are running ES 7 yet. Changing the min doc count to much higher has little effect, and changing the minimum time interval works fine if you are only looking at an hour of data, but as you expand the time range then fails. I also changed the ES setting to 100k, but Grafana is still requesting too fine a time grain. If there was an option to set not only the minimum time value, but the full time range to histogram resolution it would probably work. |
Grafana should be using elasticsearch's scroll API (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html) for this. Increasing search.max_buckets above 10000 has no effect because elasticsearch hard-caps it at 10000 |
I'm surprised how this issue underrated, faced with same problem, changing interval in panel or data source helps. But usually we look at metrics daily and want to see it with small granularity, we also want to look at metrics weekly/monthly etc., to achieve this I have to change min interval in datasource/panel or have different dashboards with different interval set - this is not convenient. |
Seems to be more and more hitting this problem so reopening the issue. |
Not exactly sure though that it's as simple as extending the automatic intervals to solve this problem. As far as I understand this also depends on how many terms aggregations and buckets you get in total so not easy to solve in Grafana. Some context to why they added the To me it sounds like you still should be able to configure Looking at Kibana seems like they still have similar problems in at least some parts: elastic/kibana#36892 One of the commenters suggest
Kibana have this related issue open regarding composite aggregations: elastic/kibana#36358 I have never used composite aggregation and I currently know too little about it and why that would be better alternative than the regular aggregations. Also seems like composite aggregations is only supported from ES 6.1 and forward. |
Just to verify, does changing the max concurrent shard request setting to 5 makes this better? |
It seems I have the same issue "Unknown elastic error response". |
@marefr Setting it to 0 is possible but than nothing seems to be getting returned. |
I'm facing the same issue with a count panel grouped by two terms and Setting the max concurrent shard to 5 did not help. It does appear that setting a higher |
Hello everyone. Also faced with such problem after we start the migration to 7.1 version of ELK. |
+1. |
Grafana does the exact same thing if you set the date histogram interval to auto. |
Having the same problem. Setting the date histogram interval to auto does not help. Cannot create a histogram that aggregates data from the last few days while before the update it was basically possible to view arbitrary time ranges. Interestingly enough a table panel with the exactly same data source does work. |
Having the same problem too. |
Increasing the |
Also reporting the same problem as described here. Expanding the timerange on a high time resolution dataset (fine grain) will cause this error. Perhaps Grafana should adapt the group by time to a wider window as the user expands the time range, in order to bring the data more aggregated. If my data has a min resolution of milisseconds, there's no need to bring millions of documents to be displayed in a 3 month chart. Data should be aggregated at ES at query level. |
I'm also seing this issue in the explore panel for the version 6.4.1 of Grafana. There is also something to do there to help display the informations without having to modify options into elastic |
Having the same problem too. Also think Grafana should use scroll API |
It may be possible to wrap the date histogram aggregation in a composite aggregation then paginate between the results and combine them client side. |
Same problem here. |
Same problem here: |
Same Problem here |
Same issue.. |
@redNixon that's definitely a bug. Thanks for reporting. |
Elasticsearch will return whatever you ask it to. The interval is a client side parameter - the problem with the implementation in Grafana is that it uses the "Date Histogram" aggregation and then doesn't scale the "interval" parameter. The whole concept of Auto isn't supported by the Date Histogram aggregation method in the Search API at all and is misleading when this option is presented in Grafana. I believe this confusion occurs because when people choose Auto in Kibana, it dynamically changes the interval size, but perhaps not how you you might think. Looking at the API documentation, the interval is supplied by the client at query. There is no feature in this part of the search API to "Auto" time interval. The problem when looking at large time series is that even though you may have < 10000 buckets, those buckets have many large shards or you are performing Term sub-aggregations along with the Date Histogram which adds more total buckets (sub queries) to the parent aggregation. That for me results in Java OOM errors in Elasticsearch. If your query generated more than 10000 buckets, you will hit the too many buckets exception as in the OP. As people have mentioned, if you manually set the Min Time Interval, you basically increase the stability of the query by reducing the total aggregation buckets. While this might work in some limited situations, it is always a trade off when zooming in to small time periods (Very large time buckets reduce resolution of visualisation) or zooming out to larger time frames (OOM/Too Many Buckets) While a solution could be coded into Grafana to scale the time interval to something they feel is sensible per quick time interval pick, the obvious solution for this in my humble opinion is to expose the Auto Date Histogram aggregation method in the Elasticsearch Search API to the Group By section in Grafana. This will allow the user to define the max number of time buckets a given visualisation should return, similar to the auto time interval in Kibana. You can check out the examples here. The user is then in control of selecting the maximum time buckets per query which allows the user to control how heavy/detailed each query is and then have Elasticsearch scale the buckets over larger time frames. I think this would be a killer feature for the Elasticsearch data source in Grafana and provide a similar experience to the default Date Aggregations in Kibana :) |
jumping on the discussion, is there a way to change the |
Hi all, For those who want to get rid of
That effectively disable the setting. For information, this setting is currently being deprecated (see elastic/elasticsearch#51731 ) |
This works flawlessly, thanks a lot! |
Hi guys, Have a good news for you: According to elastic/elasticsearch#55266
|
I think it would be ideal if grafana handled time interval dynamically based on time range like kibana. If you want per second values over multiple days. It doesn't make computational sense to request every 1 second of multiple days from elasticsearch. |
Exactly that (as I also suggested in my comment above if I may say so :-) ). |
@frittentheke @s1sfa I think Grafana shouldn't be responsible for managing the scaling when this feature is available in the Elasticsearch Search API already. Just need to add the Autodate Histogram Aggregation instead of the regular Date Histogram Aggregation to the Elasticsearch data source in Grafana, then Elasticsearch will scale the buckets accordingly to the time range query requested. |
@berglh while the new functionality migh be helpful, very helpful even it's not as simple as to just "use the right query or function". The |
@berglh - I like the idea but I think it would need a bit of testing on grafana elastic query building. I tried to simply swap out date_histogram for auto_date_histogram and it appears to not work with other aggregations, like sum. {'reason': 'The first aggregation in buckets_path must be a multi-bucket aggregation. Secondly. With this elastic way or not, the ability to scale to a specified interval is pretty important. If I want a per second rate or something, auto date doesn't have any parameters to know, but it does return the interval it used. Which would be pretty similar to grafana just changing the interval on a regular query then doing the division to get the intended values. The max_bucket being at a low threshold is mostly an elasticsearch problem which it looks like they are improving in new versions. But if we think about trying to get one month worth of per second data on a graph, some sort of auto scaling needs to exist. Whether grafana is making the decision based on some source parameters or elastic search autodate is figured out and grafana has the ability to do a calculation on the interval to have values in the desired interval, like per second. |
@s1sfa Thanks for trying it out :) Just to clarify my position if I wasn't clear, I'm not suggesting we swap it out directly. I think both types of aggregations are useful depending on the case of the visualisation. I just think providing it as an option as a query type for Elasticsearch data source would be useful for people wanting a more Kibana like experience when creating a dashboard in Grafana.
I read the recent improvement in this area that they are looking to handle the circuit breaking of long running queries more reliably to prevent Out of Memory errors. The performance of Elasticsearch has always been improving, increasing stability under larger queries over time - so you are probably right. Still, I would doubt that there is never a condition where a query will hit a circuit breaker and return a different error like "unable to service the query due to exceeding circuit breaker" due to effectively the cluster determining too many buckets being the cause of the issue. These types of problems will probably occur less with solid state storage, on spinning disk clusters though with datasets many times larger than the combined JVM heap of the cluster, or histograms split by a big number of term sub aggregations will always run into issues with buckets one way or another. @frittentheke Giving the user the ability to set a specific integer for the "buckets" parameter of the auto-date histogram query method would let the user tune the graph to the performance characteristics of the dataset and hardware performance. There is nothing stopping a user requesting the last 10 years of data and the query still timing out or hitting some other Elasticsearch performance issue - I figure there is only so much hand holding Grafana can do. I still think there is benefit at least we can give the user an option for an auto-interval scaling solution. It's up to the Grafana community and data source maintainers to decide if an auto-interval scaling solution should be handled by Grafana, and if there are any trade-offs with metrics style aggregations as you pointed out @s1sfa - I don't have enough experience with this query to say whether it's even worth implementing, I just read the manual and was voicing an opinion based on that limited information - it reads like an easy win to give the user control over to auto-interval to a sensible bucket limit on a case by case basis 😳 |
I believe this issue is actually closed by this commit: #21937. You can now set the maximum data points per visualisation which then automatically calculates the time interval of the aggregation buckets. Between setting your maximum sub aggregation size limits and the max data points, you get a nicely scaling solutions with the aggregation filter. 🎉 I am running Grafana latest from Docker Hub v7.2.0 (efe4941). |
Hello, We have Elasticsearch 7.15 and Grafana 9.4.7. We still face this problem. |
9.4.x is way past EOL and not supported anymore. does this still happen with a more recent (supported) version? |
What happened:
Upgraded to ES 7.x and Grafana 6.2.x. Some panels relying on ES datasource was showing "Unknown elastic error response" in top left corner.
Query inspector displayed this error:
caused_by:Object
type:"too_many_buckets_exception"
reason:"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting."
max_buckets:10000
What you expected to happen:
Graph to display 3 hours of data from front end proxy logs stored in ElasticSearch 7.x.
How to reproduce it (as minimally and precisely as possible):
Query a lot of data
Environment:
The text was updated successfully, but these errors were encountered: