Alerts w/ Holt-Winters

I’m trying to set up an alert on sinusoidish data using holtWintersAberration. I’m getting a lot of false-positives, and I wonder if it’s because

  1. holtWinters stuff doesn’t work unless you have a lot of data-points, and
  2. Grafana alerts make it hard to select a big-enough timespan.

Suppose I have these queries

A: summarize( foo.bar.baz, '1h', 'sum', false )
B: holtWintersAberration(#A, 2)

I’ve tried a lot of things that don’t work, including:

  • Setting summarize’s alignToInterval to true, so the newest data-point isn’t temporally incomplete.
  • Setting “override relative time” on the graph to 4 weeks, to pull more data.
  • Querying for “max” of “query(B, 3h, now)”, to look at the 3 most recent data-points, and only alarming if they’re all aberrant.

I think what I want is to query for “last” of “query(B, 3w, now)”, but when I try setting it to 3 weeks, I get an error.

Am I doing this entirely wrong? Has anyone had success setting up a Holt-Winters-based alert?

What error?

Try query(B, 3w, now-1h)

So you do not include the incomplete last hour

Thanks for the suggestion, Torkel!

After I set the query to “query(B, 3h, now-1h)”, I can save it without error.

After I set the query to “query(B, 3w, now-1h)”, I get this error when I click save:

{"message":"Invalid alert data. Cannot save dashboard"}

If you look in the Grafana server log the error is:

Invalid alert data. Cannot save dashboard ... error="time: unknown unit w in duration -3w ..."

Seems that there is only support for hours/minutes/seconds in alert conditions. So instead of 3w you’ll have to convert it to hours: 504h

That’s amazing. I’ll try that.

I set my alarm to look at the past 504 hours, and it worked.

However, my alarm is constantly flapping with false alarms, on the :05 of each hour.

It alarms intermittently, whether or not I configure the alarm to check before "now" or "now - 1h".

It alarms intermittently, no matter how I set summarize’s alignToInterval param.

I wonder if it’s even possible to set up a reliable Holt-Winters-based alarm, given the controls available to me.

Do have you any idea why it is flapping? Would it help to timeshift the query by an hour using the Graphite timeshift function?