[Grafana 8] [Unified Alerting] Single alert on multiple queries using different data sources

What Grafana version and what operating system are you using?

Grafana v8.3.4 on Linux RPM

What are you trying to achieve?

To have a single multi-dimentional alert rule on multiple queries. These queries query for same metrics from several different environments.

This is desired in order to avoid duplicating same alert rule several times for dev/test/qa environments and so on.

The real use case, which I am struggling here with - “collect all not running Kafka Connect connectors from all dev, test and qa environments in a single alert rule, and send an alert for each degraded connector”.

Kafka metrics for each environment are scraped by different Prometheus instances, which represent different Data Sources in Grafana.

How are you trying to achieve it?

I configure a separate query for a time series for each environment. The metrics are different only in label values (i.e. job and env labels have different value). The list of labels is equal in all environments.

A metric looks something like this:

kafka_connect_connect_connector_metrics{connector="connector1", env="dev1", instance="broker1.example.com", job="dev1-kafka-connect", prometheus="dev1-monitoring/dev1-monitoring", prometheus_replica="prometheus-dev1-monitoring-0", status="running"} 1

kafka_connect_connect_connector_metrics{connector="connector2", env="dev1", instance="broker1.example.com", job="dev1-kafka-connect", prometheus="dev1-monitoring/dev1-monitoring", prometheus_replica="prometheus-dev1-monitoring-0", status="stopped"} 1

My queries look something like this:

kafka_connect_connect_connector_metrics{job="dev1-kafka-connect", status=~"^(stopped|failed)$", task="", connector_class="", connector_version="", connector_type=""}

And let’s say, I have 3 such queries, using different data sources (dev, test and qa Prometheus), which return time series from 3 different environments.

Then, I apply Reduce operation to each time series to leave only one value for the alerting rule. I use Min operation for this. This means, I have 3 Reduce expressions.

Then, I do not understand how I should combine these several queries/expressions in a single “alert condition”.

What happened?

When I try to combine several expressions using another Math expression (i.e. $A + $B + $C > 0), such an expression returns “No Data”. Here already described a similar issue.

UPDATE: It looks like Math returns “No Data” when any of the used queries/expressions within it, return more than one time series (is multi-dimensional). Which is complete nonsence, because using Math is the only and the officially recommended way of how to work with multi-dimensional alerts.
Furthermore, even for single-dimensional queries, after applying Math expression on them, the user loses the ability to use $labels variable, because all labels disappear.

Without this combination, I cannot choose several conditions in “Define alert conditions” section.

With classic conditions, it is also not possible to use $labels, because classic condition-based alert doesn’t support them…

What did you expect to happen?

If several queries are allowed, then these several queries should be somehow possible to be used in a single alerting rule.

It should be possible to either “combine several queries in another query” (using “special” data source, i.e. “This Alert”), so the alert then treats time series returned by all queries as a single query.

Or it should be possible to choose several alert conditions in “Define alert conditions” section.

Can you copy/paste the configuration(s) that you are having problems with?

This is more a general question, so I don’t think it’s any how helpful.

Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.

No errors.

Did you follow any online instructions? If so, what is the URL?

I used only official Grafana docs, which are not in the ideal state, to be honest:

Similar issue from StackOverflow: prometheus - Grafana 8 - reduced alert on multiple queries - Stack Overflow

Updated the original post with new findings.

Cannot edit the subject anymore, but if someone can, I guess, this one should reflect the described issue better: [Grafana 8] [Unified Alerting] Single multi-dimensional alerting rule on multiple queries returning several time series

So no ideas at all? The combination of this issue and the other one, that it is not possible to duplicate an alert, using this new Unified Alerting in Grafana becomes really painful…

Hi @whatsupbros I very much agree to the idea, creating one alert rule for multiple queries/datasources is very important to have, otherwise you will end up with the same alert rule over and over again.

One down side that I can see to this is, its not practical to append alert rule with more queries, as soon as you save the changes, the rule will start evaluating and will start re-firing alerts thats already been fired before (v8.4.2). Regardless, the goal stays valid, one alert rule for multiple queries/datasources.

1 Like