Alert Rules with flexible thresholds for server groups

frankbo · September 22, 2023, 10:55am

I have an alert rule that generates an alert each time the threshold for free disk space is <10% or <5%. This is the default for us.
But now colleagues want to have other thresholds set for some of their servers. Colleague A wants to be alerted for his 10 servers at <30% and <20%. Colleague B wants to be alerted for his 30 servers at <8% and <3%.
Does anyone have an idea how I can solve this with Grafana Alerts?
If I write a new alert rule for the 10 servers of colleague A, then I have to filter out these servers in the standard rule. That would not be a workable solution.

Thanks a lot
Frank

yuriy.tseretyan · September 22, 2023, 3:01pm

It is possible using two data sources and Math expression! One data source that provides free disk space and another that provides thresholds. The only requirement - both data sources should provide exact the same set of labels!

Here is an example based on two Test data sources.

The most crucial part is that $D should have subset of labels of a dimension. Math expression merges $B and $D by traversing every metric in $B and looking for a matching set (or subset) of labels in metrics from $D. In this example, threshold 60 is applied to all metrics from $B (or $A) because they all have label cluster=us-east1.
IMPORTANT: If match does not exist the metric will not go to the result and you will not get alert for it

The rule in the following screenshot is updated to match by labels that are unique metrics in $B but $D does not mention all metrics in $A and therefore they get dropped

Unfortunately, I can’t think of a way to have a way to express a fallback that would match all unmatched metrics from $A.

So, if you opt in to this workflow you must deal with possibilities of dropped metrics.

frankbo · September 29, 2023, 6:50am

Thanks for your suggestion.
I am currently using a similar solution, but the alert rules are actually too complex for me. In practice I had to extend the alert rule again and again, e.g. for values for which there is no special threshold value but the standard should be taken. The whole thing must also work for 2 threshold values, whereby only the more critical one should be reported.
Therefore I had hoped that there is a simpler solution.

Topic		Replies	Views
Alert rule summary - Labels/annotations for multiple results Alerting	1	617	December 20, 2023
Email content on alert when using two conditions Alerting alerting	0	406	January 18, 2023
Grafana notifications for firing entity in Multi-query alerting rule Alerting	2	305	July 27, 2023
Alerting questions Alerting alerting	1	260	February 11, 2023
I need sample alerting rule Grafana alerting , alert-templating , alert-notifications	5	2366	August 14, 2023

Alert Rules with flexible thresholds for server groups

Related topics