I’d like an email if my site is throwing lots of errors for an extended period of time (using influxdb, if that matters)
-
if(average(errorsPerMinute, 5mins) > 10)
doesn’t work, because a single large spike can throw off the moving average -
if(max(errorsPerMinute, 5mins) > 10)
doesn’t work, because a single spike sets it off immediately -
if(min(errorsPerMinute, 5mins) > 10)
doesn’t work, because a single data point without errors will stop the alert from firing even if all the other data points are in error
Any ideas how else to do it?
I think I would like something like “send me an email if 50% of samples in the past 5 minutes are above 10” - that way it doesn’t matter if the site is slightly in error or hugely in error, and it doesn’t matter if the errors contain spikes or dips, I only get an email if most of our recent samples are above the threshold.
I’ve tried to make this happen by creating a query of isInError = errorsPerMinute > 10 ? 1 : 0
in order to get a time series of 1’s and 0’s, then alerting on average(isInError, 5mins) > 0.5
to mean “I am in error more than half the time” - but I can’t get the syntax for that to work with influxdb