Multi dimensional alerts with dynamic thresholds and routing

I’m currently working on setting up monitoring for multiple services in Grafana and have encountered a challenge with configuring dynamic alerting thresholds. Specifically, I’m looking to set different alert thresholds for different services based on their specific requirements. For example, one service might require an alert when CPU usage exceeds 80%, while another might need an alert at 90%. Each service alert must to sent to their respective slack channels.

I’ve been exploring the possibility of using Grafana’s unified alerting system to achieve this, but it seems that directly setting dynamic thresholds within the alert rule based on service-specific labels or external data sources might not be straightforward.

example cpu usage promql query (Prometheus as data source)- sum(label_replace(rate(container_cpu_usage_seconds_total{ image!="", container=~".+"}[2m]), "pod", "$1", "pod", "(.*)")) by (pod) / sum(kube_pod_container_resource_limits{resource="cpu",unit="core",container=~".+"}) by (pod) * 100.

This return 5k+ pods(service_name-1ggra… so on) for different services. Now I want to fire an alert if the cpu usage is above 80% for a Service called account-entityA and another alert for 90% above for account-entityB. Finally with the ability to route this alert to specific slack channels. Account-entityA have a different slack channel and account-entityB have a different slack channel. is it possible to achieve this using a multi dimensional alert? I have created contactpoints with the slack channels(alert and warning) and notification policy with the labels as team_name and severity.

I referred this link but it doesnt seems to be working for promql.

SEVERITY:
{{ if (gt $values.A.Value 80.0) -}}
error
{{ else if (gt $values.A.Value 50.0) -}}
warning
{{ else -}}
info
{{- end }}

query result.
{pod=“account-entityA-5df875-css5n”}
2.46911
{pod=“account-entityA-5df875-l6hcf”}
2.45589
{pod=“account-entityA-5df875-nljdm”}
2.32013
{pod=“account-entityA-5df875-qvb9z”}
2.57735
{pod=“account-entityB-675d49-hbfq2”}
1.43041
{pod=“account-entityB-675d49-jx9wp”}
0.97912
{pod=“adtech-8648c-5xnsd”}
0.52909
{pod=“adtech-8648c-8p8hj”}
0.60585
{pod=“adtech-8648c-jfp96”}
0.75641
{pod=“atp-config-dataB-bcfd-t2gv6”}
0.52198
{pod=“atp-config-dataB-bcfdf-z9zmc”}

At present using single alert for each services and wanted to move unified alerting if possible. Any suggestion or guidance is highly appreciated.

Hello @shathriyan94434

Not sure if it’s possible to achieve this at query level. You could try a different approach. Use templating to dinamically change the severity level of the label (so if a condition is met, the severity of the label will change, and so a notification will be sent to the matching contact point.

Here is another post for reference

You would add a reduce expression (last,strict), and a math expression. In the math expression, you could try something like:

 $B  >= 80 || $B >= 50

GL and Let us know if it helped.