Trouble understanding multi-dimensional alerting

  • What Grafana version and what operating system are you using?
    Grafana 10.4.1 on Ubuntu 18.04.6 LTS

  • What are you trying to achieve?
    Alerting on time-series data dropping below a given threshold.

I have a table in a postgresql datasource that is working fine for graphing purposes. The datasource has three time series tables, one of which I’d like to use for alerts. The table contains time-series data for a list of sources, tracking various sensor percentages for each source. There are roughly 30 sources, and the full list is available as a template variable, but my current understanding is that I can’t use this in alerting.

The table has a single time column, a single source column, and eight data columns named *_pct that hold an integer from 0-100 representing a coverage percentage. These can be referred to as just foo_pct, bar_pct, etc. for simplicity.

What I would like to achieve is a separate alert for each source, for each data column, if that column drops below some preset threshold. For example if foo_pct drops below 80 for source alpha, I’d like an alert for that explicit condition.

It would be simple but tedious to create an individual alert for every combination of source and column, but I’m hoping there’s a more admin-friendly way to achieve this, as creating roughly 240 different alerts, or 30 alerts each with 8 queries, that are all essentially identical is going to be very time consuming and a real pain to maintain.

Unfortunately my brain just doesn’t seem capable of reorienting itself to see the big picture “the grafana way” so I’m stalling out just trying to figure out what to do in the query editor for the alert.

For reference the corresponding query for the dashboard looks like this:

SELECT
  created AS time,
  'coverage' AS metric,
  foo_pct, bar_pct, baz_pct, ...
FROM
  timeseries_coverage
WHERE
  $__timeFilter(created)
  AND source = $source
ORDER BY created ASC;

TIA for any help educating me here!

Start with simple alert query:

SELECT
  created AS time,
  source,
  foo_pct
FROM
  timeseries_coverage
WHERE
  $__timeFilter(created)

So source is dimension and foo_pct is value, which can be used in threshold expression (so the same threshold will be applied for all sources). You should create dedicated Grafana alerts for each metric (*_pct columns), because Grafana alert works with single value only.

Of course you may have complicated requirements, e.g. source=alpha alerts when foo_pct>80%, but source=beta alerts when foo_pct>90%. Then I would say this business logic is better to code on the SQL level - example (not real SQL syntax):

SELECT
  created AS time,
  source,
  CASE 
    WHEN source='alpha' AND foo_pct>80 THEN 1,
    WHEN source='beta' AND foo_pct>90 AND .... THEN 1,
    ...
    ELSE 0
  END as alerting
FROM
  timeseries_coverage
WHERE
  $__timeFilter(created)

so you will be alerting on alerting value (1-alerting, 0-not alerting).

1 Like

Thanks, that looks like it’s going to work out just fine. I was mentally overcomplicating things it seems. Creating 8 different alerts is definitely better than 250.