Grafana Alert with Terraform

App ELB Healthy Host Count Alert - Critical

rule {
name = “App ELB Healthy Host Count Critical”
condition = “B”

annotations = {
  summary     = "App ELB has critically low healthy host count"
  description = "**Load Balancer:** {{ $labels.LoadBalancer }}\n**Target Group:** {{ $labels.TargetGroup }}\n\nCritical: No healthy instances available. Immediate investigation required."
  runbook_url = local.runbook_url
}

labels = {
  severity    = "critical"
  service     = "app-elb"
  environment = var.environment
  team        = "platform"
}

data {
  ref_id         = "A"
  datasource_uid = local.cloudwatch_uid

  relative_time_range {
    from = 600 # 10 minutes
    to   = 0
  }

  model = jsonencode({
    expression = ""
    id         = ""
    matchExact = false
    metricName = "HealthyHostCount"
    namespace  = "AWS/ApplicationELB"
    period     = "300"
    refId      = "A"
    region     = var.aws_region
    statistics = ["Average"]
    dimensions = {
      LoadBalancer = "*"
      TargetGroup  = "*"
    }
    returnData    = true
    maxDataPoints = 100
  })
}

data {
  ref_id         = "B"
  datasource_uid = "__expr__"

  relative_time_range {
    from = 0
    to   = 0
  }

  model = jsonencode({
    conditions = [
      {
        evaluator = {
          params = [local.app_elb_healthy_host_critical_threshold]
          type   = "lt"
        }
        operator = {
          type = "and"
        }
        query = {
          params = ["A"]
        }
        reducer = {
          params = []
          type   = "last"
        }
        type = "query"
      }
    ]
    datasource = {
      name = "Expression"
      type = "__expr__"
      uid  = "__expr__"
    }
    expression    = ""
    hide          = false
    intervalMs    = 1000
    maxDataPoints = 43200
    reducer       = "last"
    refId         = "B"
    type          = "classic_conditions"
  })
}

no_data_state  = "NoData"
exec_err_state = "Alerting"

}

I have a dimension using wild card for both Load Balancer and Target Group, problem is the values are not displaying at all, I am using grafana version 10.4.1, using Amazon Managed Grafana and handling through Terraform.

Is there a way to get the exact value for Load Balancer and Target Group? Can some one send me links to documentation on how to improve the alerting and labels, thankyou

I am trying to change classic condition to reduce as suggested by Grot, I will update if the issue resolves

1 Like

Thankyou Jangaraj, will try that

I tried manually creating an alert rule via Grafana UI and extracted the hcl code for it but when I apply the same extracted code via Terraform, I get some errors, is there a documentation link which can help with Terraform side, thankyou

Always be specific, what are those some errors, how that developed alert query looks like in the ui, what is expected…

Apologies for not being specific, the way Terraform code perceives the extracted code from grafana UI, it needs a few more fields,

App ELB Healthy Host Count Alert - Critical

rule {
name = “App ELB Healthy Host Count Critical”
condition = “C”
for = “5m”
annotations = {
summary = “App ELB has critically low healthy host count”
description = “Load Balancer: {{ $labels.LoadBalancer }}\nTarget Group: {{ $labels.TargetGroup }}\nCurrent Healthy Host Count: {{ $values.B.Value }}\n\nCritical: No healthy instances available. Immediate investigation required.”
runbook_url = local.runbook_url
}
labels = {
severity = “critical”
service = “app-elb”
environment = var.environment
team = “platform”
}
data {
ref_id = “A”
datasource_uid = local.cloudwatch_uid
relative_time_range {
from = 300 # 5 minutes
to = 0
}
model = jsonencode({
expression = “”
id = “”
matchExact = false
metricName = “HealthyHostCount”
namespace = “AWS/ApplicationELB”
period = “300”
refId = “A”
region = var.aws_region
statistics = [“Minimum”]
dimensions = {
LoadBalancer = “"
TargetGroup = "

}
intervalMs = 1000
maxDataPoints = 43200
metricEditorMode = 0
metricQueryType = 0
queryMode = “Metrics”
})
}
data {
ref_id = “B”
datasource_uid = “expr
relative_time_range {
from = 0
to = 0
}
model = jsonencode({
conditions =
datasource = {
name = “Expression”
type = “expr
uid = “expr
}
expression = “A”
hide = false
intervalMs = 1000
maxDataPoints = 43200
refId = “B”
type = “reduce”
reducer = “last”
settings = {
mode = “replaceNN”
replaceWithValue = 0
mode = “dropNN”
}
})

}

data {
  ref_id         = "C"
  datasource_uid = "__expr__"
  relative_time_range {
    from = 300 # 5 minutes
    to   = 0
  }
  model = jsonencode({
    conditions = [
      {
        evaluator = {
          params = [local.app_elb_healthy_host_critical_threshold]
          type   = "lt"
        }
        operator = {
          type = "and"
        }
        query = {
          params = ["B"]
        }
        type = "query"
      }
    ]
    datasource = {
      name = "Expression"
      type = "__expr__"
      uid  = "__expr__"
    }
    expression    = "B"
    hide          = false
    intervalMs    = 1000
    maxDataPoints = 43200
    refId         = "C"
    type          = "threshold"
  })
}
no_data_state  = "NoData"
exec_err_state = "Alerting"

}
}

on running with Terraform Apply I still see errors in grafana

As I said develop the alert in the UI first and then use terraform.

thanks for your assistance, the terraform code in previous post, was what was generated through Grafana UI but I had to make few changes, like I mentioned Grafana provider with Terraform could be a cause of this, I will figure out a way, thanks again.