I’m currently working on setting up monitoring for multiple services in Grafana and have encountered a challenge with configuring dynamic alerting thresholds. Specifically, I’m looking to set different alert thresholds for different services based on their specific requirements. For example, one service might require an alert when CPU usage exceeds 80%, while another might need an alert at 90%. Each service alert must to sent to their respective slack channels.
I’ve been exploring the possibility of using Grafana’s unified alerting system to achieve this, but it seems that directly setting dynamic thresholds within the alert rule based on service-specific labels or external data sources might not be straightforward.
example cpu usage promql query (Prometheus as data source)- sum(label_replace(rate(container_cpu_usage_seconds_total{ image!="", container=~".+"}[2m]), "pod", "$1", "pod", "(.*)")) by (pod) / sum(kube_pod_container_resource_limits{resource="cpu",unit="core",container=~".+"}) by (pod) * 100
.
This return 5k+ pods(service_name-1ggra… so on) for different services. Now I want to fire an alert if the cpu usage is above 80% for a Service called account-entityA and another alert for 90% above for account-entityB. Finally with the ability to route this alert to specific slack channels. Account-entityA have a different slack channel and account-entityB have a different slack channel. is it possible to achieve this using a multi dimensional alert? I have created contactpoints with the slack channels(alert and warning) and notification policy with the labels as team_name and severity.
I referred this link but it doesnt seems to be working for promql.
SEVERITY:
{{ if (gt $values.A.Value 80.0) -}}
error
{{ else if (gt $values.A.Value 50.0) -}}
warning
{{ else -}}
info
{{- end }}
query result.
{pod=“account-entityA-5df875-css5n”}
2.46911
{pod=“account-entityA-5df875-l6hcf”}
2.45589
{pod=“account-entityA-5df875-nljdm”}
2.32013
{pod=“account-entityA-5df875-qvb9z”}
2.57735
{pod=“account-entityB-675d49-hbfq2”}
1.43041
{pod=“account-entityB-675d49-jx9wp”}
0.97912
{pod=“adtech-8648c-5xnsd”}
0.52909
{pod=“adtech-8648c-8p8hj”}
0.60585
{pod=“adtech-8648c-jfp96”}
0.75641
{pod=“atp-config-dataB-bcfd-t2gv6”}
0.52198
{pod=“atp-config-dataB-bcfdf-z9zmc”}
At present using single alert for each services and wanted to move unified alerting if possible. Any suggestion or guidance is highly appreciated.