Hello there,
I came across unexpected behavior with a grafana alerting.
What I expected:
Grafana alert status should go to pending and then to alerting and should remain with the alerting status as long as the alert condition does not change.
What happened:
Periodically about every three hours (the time in between was always the same down to a few seconds) the alert status changed to OK and then pending before going to alerting again. As you can see in the picture below, the alert condition did not change
This creates unintended realerts every three hours.
(The slightly darker green lines are non alert related annotations)
The configured alert rule looks like this:
The TSDB backend is influxdb and I use this query:
SELECT max("number_of_messages_received_sum") FROM "cloudwatch_aws_sqs" WHERE ("queue_name" = 'dlq') AND $timeFilter GROUP BY time(1m) fill(previous)"
The influxdb has data points every 5m.
The Grafana version is 7.0.2
Grafana is deployed as an AWS ECS service without a persistent database but there were no redeploys of the Grafana service while this incident occured.
I guess there is something odd with the configuration but nothing to obvious to me, otherwise I would not ask for help
Any help/questions/suggestions appreciated!
A stripped off version of the panel with the alert:
{ "alert": { "alertRuleTags": {}, "conditions": [ { "evaluator": { "params": [ 0 ], "type": "gt" }, "operator": { "type": "and" }, "query": { "params": [ "A", "15m", "now" ] }, "reducer": { "params": [], "type": "max" }, "type": "query" } ], "executionErrorState": "alerting", "for": "2m", "frequency": "2m", "handler": 1, "message": "Alert!", "name": "Example alert", "noDataState": "ok", "notifications": [ { "uid": "notifier2a" } ] }, "aliasColors": {}, "dashLength": 10, "fill": 1, "gridPos": { "h": 9, "w": 24, "x": 0, "y": 9 }, "id": 2, "legend": { "avg": false, "current": false, "max": true, "min": false, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "pointradius": 2, "renderer": "flot", "seriesOverrides": [ { "alias": "/.*/", "yaxis": 2, "$$hashKey": "object:103" } ], "spaceLength": 10, "targets": [ { "alias": "alias", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "previous" ], "type": "fill" } ], "measurement": "cloudwatch_aws_sqs", "orderByTime": "ASC", "policy": "default", "query": "SELECT max(\"number_of_messages_received_sum\") FROM \"cloudwatch_aws_sqs\" WHERE (\"queue_name\" = 'dlq') AND $timeFilter GROUP BY time(1m) fill(previous)", "rawQuery": false, "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "approximate_number_of_messages_visible_sum" ], "type": "field" }, { "params": [], "type": "sum" } ] ], "tags": [ { "key": "queue_name", "operator": "=", "value": "dlq" } ] } ], "thresholds": [ { "colorMode": "critical", "fill": true, "line": true, "op": "gt", "value": 0, "yaxis": "left" } ], "timeRegions": [], "title": "dead letter queue", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "decimals": 0, "format": "short", "label": null, "logBase": 1, "max": "-0.1", "min": "-1", "show": false }, { "decimals": 0, "format": "short", "label": null, "logBase": 1, "max": null, "min": "0", "show": true } ], "yaxis": { "align": false, "alignLevel": null }, "fieldConfig": { "defaults": { "custom": {} }, "overrides": [] }, "bars": false, "dashes": false, "decimals": 0, "fillGradient": 0, "hiddenSeries": false, "percentage": false, "points": false, "stack": false, "steppedLine": false, "timeFrom": null, "timeShift": null, "datasource": null }