Creating alerts & Best Practices

Hi All,

A bit of background:
We have set up a new grafana install and have AWS Cloudwatch & Prometheus data sources added. I have imported a few dashboards and customised them as we would like.

One of which is this one:

On the dashboard the monitor I have set up is as follows -
probe_success{instance=~"$target", job="$App"}

our environment consists of various production, staging and test servers (their host names indicate which environment they are part of)

For example:
srv01-staging
srv01-production

I’m trying to create an alert to monitor the HTTP response for ONLY the production servers.

My alert code is as below:
probe_success{job = “nameofjob”}

My issue is that this will alert on ALL failures even on our staging/test environments which I do not want.

I don’t believe we can use variables in alerts - or if we can I havn’t been able to get it working.

TLDR:
What is the best way to segment alerts so that I am not notified of issues with our staging/test environments?

Many Thanks!

Have I posted this in the wrong place?

Have I worded my question in a confusing way? I would think this is quite widely done by other users, if not can anyone suggest an alternative way to achieve what I am attempting?

2 Likes

its not just you… it seems the Grafana community doesnt know :slight_smile: I have issues with my alerts too and nobody seems to know the answer :frowning:

Create separate modules in blackbox then alert based on module/job

You should add labels to your server targets, that way, you can create an alert rule and set an explicit condition using that label.

I’m monitoring OPNsense firewalls located in different facilities.
prometheus config:

# OPNsense firewall

scrape_configs:

  - job_name: "firewall"

    static_configs:

      # Datacenter
      - targets: ["10.50.0.1:9100"]
        labels:
          facility: "datacenter"

      # Facility 1
      - targets: ["10.57.0.1:9100"]
        labels:
          facility: "facility1"

      # Facility 2
      - targets: ["10.62.0.1:9100"]
        labels:
          facility: "facility2"

You can add custom labels for each target, to group or separate them as needed.
The label can be used as filter on the alerting rule:

probe_success{job="$App", facility="datacenter"}

2 Likes