I have an alert rule that generates an alert each time the threshold for free disk space is <10% or <5%. This is the default for us.
But now colleagues want to have other thresholds set for some of their servers. Colleague A wants to be alerted for his 10 servers at <30% and <20%. Colleague B wants to be alerted for his 30 servers at <8% and <3%.
Does anyone have an idea how I can solve this with Grafana Alerts?
If I write a new alert rule for the 10 servers of colleague A, then I have to filter out these servers in the standard rule. That would not be a workable solution.
It is possible using two data sources and Math expression! One data source that provides free disk space and another that provides thresholds. The only requirement - both data sources should provide exact the same set of labels!
Here is an example based on two Test data sources.
The most crucial part is that $D should have subset of labels of a dimension. Math expression merges $B and $D by traversing every metric in $B and looking for a matching set (or subset) of labels in metrics from $D. In this example, threshold 60 is applied to all metrics from $B (or $A) because they all have label cluster=us-east1.
IMPORTANT: If match does not exist the metric will not go to the result and you will not get alert for it
The rule in the following screenshot is updated to match by labels that are unique metrics in $B but $D does not mention all metrics in $A and therefore they get dropped
Thanks for your suggestion.
I am currently using a similar solution, but the alert rules are actually too complex for me. In practice I had to extend the alert rule again and again, e.g. for values for which there is no special threshold value but the standard should be taken. The whole thing must also work for 2 threshold values, whereby only the more critical one should be reported.
Therefore I had hoped that there is a simpler solution.