Hello,
I am still getting the hang of Grafana Alerting and have run into an issue where my Kusto Query comes back with No Data. I have a Threshold expression that checks to make sure there is at least 1 value (Input A is above 0). Even with that, it still sees the Threshold expression as being evaluated to true and sends off an alert without any data attached which is rather confusing. Not sure if the threshold expression is needed, if returning an object of data isn’t right, or if somehow the alert is retaining a resolved state and sending that off with no data?
Makes me think that the No Data is being seen as truthy because I am doing a summarize in order to create an object that can be used in the alert as seen below:
I have seen this other issue but there was no resolution because setting state to OK when No Data comes doesn’t effect it. Still sends an alert out.
Any help would be appreciated as I have been trying to get this working for a couple weeks. Thank you ahead of time!
I believe I found the reason my alerts were sending with no data. In the Contact Point under Notification Settings there is this one and only checkbox:
I am only looking at the firing alerts and none of the resolved so this would explain why I get blank alerts. Testing this throughout this week to make sure this is the solution.
Can you share a screen of the message. I don’t think disabling resolved message would do much in your case - it’s just disabling the second message (the one with Alert Resolved), it won’t have any effect on Firing message (at least I think so).
This ended up not working as I have gotten 3 alerts with no data again.
I am using a notification template that looks like this:
{{ define "EmailTemplate" }}
{{ index (index .Alerts 0).Labels "subscription"}} has experienced a problem on these devices.
{{ range .Alerts.Firing }}
{{ index .Labels "hostname" }}
DNS: {{index .Labels "dns_result"}} | TCP: {{index .Labels "tcp_result"}} | Server: {{index .Labels "server_result"}} | App: {{index .Labels "app_result"}}
{{ end }}
You received this because you are subscribed to Notifications
For more information, visit https://fakeurl.com
{{ end }}
With data coming in that looks like the above screenshot from my first post. The actual message is below:
And can you check the alert state history? Is there anything that would point to No Data?
This would be the state that is represented in the alert. But it is being considered
Firing
because that would be the only way it would loop through that
range
loop
Can you also share the screen if the No Data setting is set to something other than No Data? I know you said it was but maybe you didn’t save that? It looks like that tbh
th
Here is the configuration that I have now because it seems like setting it to OK didn’t do anything.
To me, this seems wrong. At the top of the page, we have something telling us No Data, then you go to the bottom and see the expression is being evaluated to true (assuming the green checkmark is saying that it is firing)
Then we go over to the actual alert rule when there isn’t data and we get a warning
Maybe this is just a UI thing that needs updating?
Here is another example of state history. I just don’t see how a state of No Data would end up sending an alert. This does not seem intended but maybe I’ve probably just missed something in configuring these alerts
(assuming the green checkmark is saying that it is firing)
The green checkmark near the threshold only indicates that this is considered an alert condition, not that the alert is firing (imagine having another threshold for templates in alerts, you wouldn’t like to have that as an alert condition).
Due to the No Data
setting you will receive an alert an instant when there’s no data (it’s like a panic mode, when you expect your query to always return data and now it’s not, so full panic). Maybe in the past you had the setting set to Alerting? In such a case you would also receive the alert without any data, but in such a case No Data
is considered a threshold breach. From what you say, the lack of data in your query is expected, therefore change the setting to OK
or Normal
(depending on your Grafana version) - try that setting for a moment and check if your alerts would still fire without any data. I found something like this
to be the best combination so far (at least for alerts I expect everything is alright when there’s no data returned like number of 5xx or pod restarts).
Alrighty, I am going to give this a shot and see how it goes. Will report back with results
1 Like
I would also recommend changing Error
to something else. We had hard times with our datasource and it wasn’t fun having all developers waking up in the middle of the night because of it but it might not be as harsh in your case
Uh oh spaghettio. No Data email!
Is it possible that an ADX data source is returning an object to the alert, that alert is evaluating that as a truth-y evaluation, and then sending along the alert?
Main reason being is that ADX does send back an object with a columns property and a data property and then you have to put it together (which is what we do in our APIs for our main project). Seems far fetched but I just don’t see what I am missing here
EDIT: So I have let this run for the entire day as well as into the night and didn’t change the alerts with the OK setting for No Data and I guess there was something left over that caused this alert above but I haven’t seen a No Data email since this last one.
Trying something out that may end up working but seems like a bit of a work around.
I have built a Data Query as well as a Count Query with a Threshold Expression looking at the count. This is because I think the return on the data query (which is the same as the one I have been using) is coming back in some way that is being evaluated as true even tho there is no series data there. Ideally having a count in this way will fix the issue. Probably won’t have results till tomorrow
One thing to keep in mind with this solution that I ran into is if you have recently switched them to an OK/Normal state when there is no data, you have to give it time to go out of a firing state and go back to a normal state. I was impatient with this and ended up having to circle back to this solution.
Thank you for the help. This has been weeks worth of a problem and I’m happy to be more comfortable with a tool we will be using a lot more of.
1 Like