-
Notifications
You must be signed in to change notification settings - Fork 11.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add elasticsearch alerting #11380
Add elasticsearch alerting #11380
Conversation
upgrade from grafana/grafana
Signed-off-by: wph95 <wph657856467@gmail.com>
Signed-off-by: wph95 <wph657856467@gmail.com>
- add some test
Codecov Report
@@ Coverage Diff @@
## master #11380 +/- ##
==========================================
- Coverage 51.9% 51.72% -0.18%
==========================================
Files 359 365 +6
Lines 26066 26509 +443
Branches 1509 1556 +47
==========================================
+ Hits 13530 13713 +183
- Misses 11796 12030 +234
- Partials 740 766 +26 |
I have downloaded this but it seems that it doesn't work with template based queries. Do you have planned adding this? Kind regards |
Hi, Did you make some additional tests ? Is it working fine ? Thanks for your support, Cheers. |
Do you have some news about the tests you made ? Cheers. |
Hi,
Sorry for the delay. Hi have been trying it for a while but I still have to
check one thing that seems extrange to me, the values on the alert differ
from the graph even when both of them are configuren on the same way.
Kind regards
2018-04-16 19:13 GMT+02:00 Knaky41 <notifications@github.com>:
… Hi,
Did you make some additional tests ? Is it working fine ?
Thanks for your support,
Cheers.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11380 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AkPsZ1hxRMhsQn6qTNQHI3UdrEPJFqoRks5tpNFJgaJpZM4S6-_m>
.
|
…feature/add_es_alerting
@pbuentam I observed similar behavior in alerting on Influx data source but didn't have time to check it deeper. So that may be a bug on totally different level. I'm not saying that it is in your case but that may be a hint for you. |
@wph95 just a heads up. We've been starting to review this and initiated refactoring work. If you didn't change the default setting for Allow edits from maintainers when you created the pull request we'll push our changes to your fork. If you did, we'll need to branch of and create a separate PR. Our hope is to be able to merge this to master soon. |
@marefr yep i did't change the Allow edits from maintainers |
@pbuentam could you give more information about the problem when you meet. |
@marefr I'll have a go with this branch if it helps as we need this feature and blocks us at the moment. How close are we to get this merged to master and release version? Rough guesstimate would do. |
@marefr, I was able to checkout the pull request, compile and run. Elasticsearch version
Grafana Panel Alert Config Notes
|
Thanks for trying it out @szaroubi - are you sure you're using the last commit in this branch since I've pushed some changes the last couple of days? Can you please include the full json of your panel so I can try it out? |
@marefr, As for JSON of panel, I currently don't have access to the environment in which grafana is installed and can't provide the JSON. |
Panel{
"alert": {
"conditions": [
{
"evaluator": {
"params": [
30
],
"type": "gt"
},
"operator": {
"type": "and"
},
"query": {
"params": [
"B",
"1m",
"now"
]
},
"reducer": {
"params": [],
"type": "max"
},
"type": "query"
}
],
"executionErrorState": "alerting",
"frequency": "10s",
"handler": 1,
"name": "Log levels alert",
"noDataState": "no_data",
"notifications": [
{
"id": 1
}
]
},
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "ES Radio-IP Prod",
"fill": 1,
"gridPos": {
"h": 5,
"w": 24,
"x": 0,
"y": 15
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": true,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"bucketAggs": [
{
"fake": true,
"field": "syslog_level.keyword",
"id": "3",
"settings": {
"min_doc_count": 1,
"order": "desc",
"orderBy": "_term",
"size": "10"
},
"type": "terms"
},
{
"field": "@timestamp",
"id": "2",
"settings": {
"interval": "20s",
"min_doc_count": 0,
"trimEdges": 0
},
"type": "date_histogram"
}
],
"metrics": [
{
"field": "select field",
"id": "1",
"type": "count"
}
],
"query": "syslog_level.keyword:$LogLevel AND syslog_program.keyword:$Program",
"refId": "A",
"timeField": "@timestamp"
},
{
"bucketAggs": [
{
"field": "@timestamp",
"id": "2",
"settings": {
"interval": "20s",
"min_doc_count": 0,
"trimEdges": 0
},
"type": "date_histogram"
}
],
"metrics": [
{
"field": "select field",
"id": "1",
"type": "count"
}
],
"query": "syslog_level.keyword:$LogLevel AND syslog_program.keyword:$Program",
"refId": "B",
"timeField": "@timestamp"
}
],
"thresholds": [
{
"value": 30,
"op": "gt",
"fill": true,
"line": true,
"colorMode": "critical"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Log levels",
"tooltip": {
"shared": true,
"sort": 1,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}```
</details> |
@marefr, |
@szaroubi yes that's actually a real bug you did found - there should actually be a clear error message there which is currently not implement so thank you for finding this - will fix that asap. |
@marefr, |
If datasource handles targetContainsTemplate function it can evaluate if a certain query contains template variables and this is used for show an error message that template variables not is supported in alert queries.
Handle all replacements if interval template variables in the client. Fix issue with client and different versions. Adds better tests of the client
@pbuentam please include some details about your metric query and raw response data (use the query inspector for this). You've selected unit seconds - are you sure that the raw data that comes back from metric query are in second format? If they for example comes back in millisecond format I think you'll need to use 10000 in the alert config to represent 10 seconds. You've set Evaluate every 5m, you you please try and change this to 60s (default) just to make sure that it's not was causing your problems. |
I have set the units to none and Evaluate to 60s. Result{
"xhrStatus": "complete",
"request": {
"method": "POST",
"url": "api/datasources/proxy/10/_msearch",
"data": "{\"search_type\":\"query_then_fetch\",\"ignore_unavailable\":true,\"index\":[\"logstash-wlaccess-2018.06.01\"]}\n{\"size\":0,\"query\":{\"bool\":{\"filter\":[{\"range\":{\"@timestamp\":{\"gte\":\"1527812908675\",\"lte\":\"1527816589767\",\"format\":\"epoch_millis\"}}},{\"query_string\":{\"analyze_wildcard\":true,\"query\":\"app:psportal AND env:produccion AND web:autoservicio\"}}]}},\"aggs\":{\"2\":{\"date_histogram\":{\"interval\":\"5m\",\"field\":\"@timestamp\",\"min_doc_count\":0,\"extended_bounds\":{\"min\":\"1527812908675\",\"max\":\"1527816589767\"},\"format\":\"epoch_millis\"},\"aggs\":{\"1\":{\"avg\":{\"field\":\"time_taken\"}}}}}}\n"
},
"response": {
"responses": [
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 459,
"max_score": 0,
"hits": []
},
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": 0.011500000488013029
},
"key_as_string": "1527812700000",
"key": 1527812700000,
"doc_count": 2
},
{
"1": {
"value": 0.010771929852157962
},
"key_as_string": "1527813000000",
"key": 1527813000000,
"doc_count": 57
},
{
"1": {
"value": 3.4190000845177564
},
"key_as_string": "1527813300000",
"key": 1527813300000,
"doc_count": 60
},
{
"1": {
"value": 21.798712007599093
},
"key_as_string": "1527813600000",
"key": 1527813600000,
"doc_count": 66
},
{
"1": {
"value": 0.00975000043399632
},
"key_as_string": "1527813900000",
"key": 1527813900000,
"doc_count": 4
},
{
"1": {
"value": 0.009500000393018126
},
"key_as_string": "1527814200000",
"key": 1527814200000,
"doc_count": 4
},
{
"1": {
"value": 0.010250000283122063
},
"key_as_string": "1527814500000",
"key": 1527814500000,
"doc_count": 4
},
{
"1": {
"value": 0.00975000043399632
},
"key_as_string": "1527814800000",
"key": 1527814800000,
"doc_count": 4
},
{
"1": {
"value": 0.01000000024214387
},
"key_as_string": "1527815100000",
"key": 1527815100000,
"doc_count": 4
},
{
"1": {
"value": 1.7438275462482125
},
"key_as_string": "1527815400000",
"key": 1527815400000,
"doc_count": 58
},
{
"1": {
"value": 0.010000000474974513
},
"key_as_string": "1527815700000",
"key": 1527815700000,
"doc_count": 4
},
{
"1": {
"value": 0.487445647080439
},
"key_as_string": "1527816000000",
"key": 1527816000000,
"doc_count": 92
},
{
"1": {
"value": 0.5569400003890042
},
"key_as_string": "1527816300000",
"key": 1527816300000,
"doc_count": 100
}
]
}
},
"status": 200
}
]
}
} |
@pbuentam since you're using an interval of 5 minutes you'll see that the last 5 minutes are missing in the graph - this is a general recommendation for Grafana alerting when having this scenario: Configure alert so that the time settings is like/similar to Query(A, 5m, now-5m) You basically saying to alerting engine that don't look at the latest 5 minutes since there won't be any data there. I think you're hitting this problem and since you have If no data or all values are null = keep last state you encounter your described problem. |
@wph95 Thank you for your first contribution to Grafana! |
And thanks to all of you have been helping us test this. Test will continue until we release v5.2 stable and I'll encourage you to create an issue if you find any problems. |
@yossiv looking at the graph it seems correct - the green series is at the bottom. Please change to query/A, 5m, now) or similar to average over longer time. |
Hi @marefr thanks. |
Fixes #5893