Multi dimensional alerts with dynamic thresholds and routing

shathriyan94434 · May 13, 2024, 11:52pm

I’m currently working on setting up monitoring for multiple services in Grafana and have encountered a challenge with configuring dynamic alerting thresholds. Specifically, I’m looking to set different alert thresholds for different services based on their specific requirements. For example, one service might require an alert when CPU usage exceeds 80%, while another might need an alert at 90%. Each service alert must to sent to their respective slack channels.

I’ve been exploring the possibility of using Grafana’s unified alerting system to achieve this, but it seems that directly setting dynamic thresholds within the alert rule based on service-specific labels or external data sources might not be straightforward.

example cpu usage promql query (Prometheus as data source)- sum(label_replace(rate(container_cpu_usage_seconds_total{ image!="", container=~".+"}[2m]), "pod", "$1", "pod", "(.*)")) by (pod) / sum(kube_pod_container_resource_limits{resource="cpu",unit="core",container=~".+"}) by (pod) * 100.

This return 5k+ pods(service_name-1ggra… so on) for different services. Now I want to fire an alert if the cpu usage is above 80% for a Service called account-entityA and another alert for 90% above for account-entityB. Finally with the ability to route this alert to specific slack channels. Account-entityA have a different slack channel and account-entityB have a different slack channel. is it possible to achieve this using a multi dimensional alert? I have created contactpoints with the slack channels(alert and warning) and notification policy with the labels as team_name and severity.

I referred this link but it doesnt seems to be working for promql.

SEVERITY:
{{ if (gt $values.A.Value 80.0) -}}
error
{{ else if (gt $values.A.Value 50.0) -}}
warning
{{ else -}}
info
{{- end }}

query result.
{pod=“account-entityA-5df875-css5n”}
2.46911
{pod=“account-entityA-5df875-l6hcf”}
2.45589
{pod=“account-entityA-5df875-nljdm”}
2.32013
{pod=“account-entityA-5df875-qvb9z”}
2.57735
{pod=“account-entityB-675d49-hbfq2”}
1.43041
{pod=“account-entityB-675d49-jx9wp”}
0.97912
{pod=“adtech-8648c-5xnsd”}
0.52909
{pod=“adtech-8648c-8p8hj”}
0.60585
{pod=“adtech-8648c-jfp96”}
0.75641
{pod=“atp-config-dataB-bcfd-t2gv6”}
0.52198
{pod=“atp-config-dataB-bcfdf-z9zmc”}

At present using single alert for each services and wanted to move unified alerting if possible. Any suggestion or guidance is highly appreciated.

antonio · May 14, 2024, 11:18am

Hello @shathriyan94434

Not sure if it’s possible to achieve this at query level. You could try a different approach. Use templating to dinamically change the severity level of the label (so if a condition is met, the severity of the label will change, and so a notification will be sent to the matching contact point.

Here is another post for reference

You would add a reduce expression (last,strict), and a math expression. In the math expression, you could try something like:

 $B  >= 80 || $B >= 50

GL and Let us know if it helped.

pepecano · June 2, 2025, 2:43pm

Just a heads-up for anyone revisiting this thread — the Grafana Alerting docs now include an example showing how to configure dynamic thresholds:

Topic		Replies	Views
One Alert rule with severity/thresholds: warn, critical etc Alerting templating , alerting	16	12500	June 2, 2025
Multidimensional alert with multiple tresholds Alerting	3	2831	June 2, 2025
Separate alert threshold for each alert instance in multi-dimensional alerts Alerting alerting	3	847	April 21, 2024
Grafana alert with multiple "stages" Alerting alerting	8	1230	August 21, 2023
Dynamic alerting thresholds in unified alerting Alerting alerting	12	7833	June 2, 2025

Multi dimensional alerts with dynamic thresholds and routing

Related topics