Issue with Alerts using sum() with PostGres datasource adding all data points

Alerts are not working as expected due to a possible bug with data being converted to milliseconds (epoch) by the graph panel.

SUM() function seems to return the sum of all data points instead of last 1 hour value. It seems to be a but with epoch converted to milliseconds by graph panel.

Version 5.2.1
POSTGRESQL
Ubuntu 18.04
Issue :
I have setup a graph panel with timeseries data, everything looks fine here and data is plotted accurately however when I setup the alert with sum(A, 1h, now) - it always returns the total sum of all data points instead of last 1 hour, when analysed the data is listed in millseconds on the alerts tab same as on graph panel but doesnt seem to work on alert.

What was the expected result?

I expect the SUM() function to show the total of values within the time range.

What happened instead?

the SUM() function returns value of all the data points
4623 instead of much lower value for last 1 hour data.

{“message”:“Condition[0]: Eval: true, Metric: failures, Value: 4623.000”,“data”:null}

{“firing”:true,“state”:“alerting”,“conditionEvals”:“true = true”,“timeMs”:“620.846ms”,“matches”:[{“metric”:“failures”,“value”:4623}],“logs”:[{“message”:“Condition[0]: Query Result”,“data”:[{“name”:“failures”,“points”:[[106,1532300400000],[118,1532302200000],[132,1532304000000],[111,1532305800000],[112,1532307600000],[109,1532309400000],[116,1532311200000],[122,1532313000000],[106,1532314800000],[111,1532316600000],[107,1532318400000],[122,1532320200000],[107,1532322000000],[117,1532323800000],[130,1532325600000],[130,1532327400000],[331,1532329200000],[268,1532331000000],[178,1532332800000],[173,1532334600000],[160,1532336400000],[176,1532338200000],[170,1532340000000],[132,1532341800000],[160,1532343600000],[150,1532345400000],[154,1532347200000],[164,1532349000000],[159,1532350800000],[158,1532352600000],[166,1532354400000],[68,1532356200000]]}]},{“message”:“Condition[0]: Eval: true, Metric: failures, Value: 4623.000”,“data”:null}]}

A similar issue was raised here however the last bit wasnt answered. “For instance Count/Sum() Query (A,5m,now) counts or sums the value of the entire table instead of the ones within the time frame”.

I’m facing the exact same issue,sum() function will add up the entire data point and trigger the alert while I expect to add up the data only within the time range

1 Like

Please include the query you’re using

Thank you, here it is - any help will be great.

WHEN sum () OF query (B, 1h, now) IS ABOVE 50

hi Chia,

Any update - did it work for you now, I thought the recent update to grafana would have fixed it as they mentioned in their release log about epoch conversion to milliseconds.

here I have explained what i think is the cause

Please include the SQL query you’re using thanks. Seems to me you’ve forgot to add a where $__timeFilter…

thanks @mefraimsson here it is.
SELECT COUNT(*) as Failures, $__timeGroup(alarmdate,‘30m’) FROM public.vw_pstnfailrestorecurrentday
where failuretime is not null
GROUP BY time
order by time

As I thought, you need to add $__timeFilter(alarmdate) in your where clause so that the dashboard time range are applied to the query and for the alert to only look at 1h of data (as you’ve configured).

An additional pointer that could possibly trouble you. Since you’re looking at 1 hour window and grouping by 30 minutes it’s possible that the last 30 minute window not will be part and/or complete when evaluating the alert rule.

In this case you probably want to define alert rule as sum(A, 1h, now-30m). Maybe that doesn’t make sense or is not what you expect since that will only look at 1 hour window minus 30 minutes?

1 Like

thanks @mefraimsson Let me try that now, I didnt realise the alert rule will re-execute the query on db, I thought it would use the data plotted on the graph, any ways give me 5 mins

wow! that did the trick, the alert tab now only shows last 4 data points and then does a sum() of those and then compare against my threshold, great job. Sorry i was late in replying.
I can now get to speed.
Amazing stuff - Thanks a lot!

Refer to documentation for further information.

1 Like