Grafana is throwing false-positive alerts "InstanceDown"

Hi All,

We have this monitoring alert, which is being thrown every 3 hours:
[FIRING:2] InstanceDown (: opsgenie)
Alert: Instance :9113 - web-xyz down
Description: :9113 - web-xyz of job nginx-exporter-stage has been down for more than 1 minutes.
Details:
alertname: InstanceDown
Environment: stage
instance: :9113
instance_id: i-0a2s3d4fg56yhj78k9
instance_name: web-xyz
job: nginx-exporter-stage
prometheus_host: .ec2.internal
severity: opsgenie

Upon checking (querying) in Grafana, result shows it’s down for a long time now.

up{job=“node-exporters”} == 1

  1. The thing is, the instance never went down.
    It’s been up & running all the time, but Grafana (Prometheus) shows differently.

  2. The port 9113 is never really open… (9100 is open, Prometheus uses this, right?)

  • What Grafana version and what operating system are you using?

Grafana v8.0.3 (cae5c5e46b), Ubuntu

  • What are you trying to achieve?

figure out the fix for this.

  • How are you trying to achieve it?

I’m checking if there’s somethings wrong with the prometheus.yml

  • What happened?

Grafana/Prometheus is throwing false alerts frequently.

  • What did you expect to happen?

we expect to only received this alert when the server/s is actually down.

  • Can you copy/paste the configuration(s) that you are having problems with?

snippet:
###########

  • ec2_sd_configs:
    • filters:
      • name: tag:Role
        values:
        • web-staging-ec2
          port: 9113
          region: us-east-1
    • filters:
      • name: tag:Role
        values:
        • web-prod
          port: 9113
          region: us-west-2
    • filters:
      • name: tag:Role
        values:
        • web-prod
          port: 9113
          region: ca-central-1
    • filters:
      • name: tag:Role
        values:
        • web-prod
          port: 9113
          region: ap-southeast-2
    • filters:
      • name: tag:Role
        values:
        • web-prod
          port: 9113
          region: eu-west-1
          job_name: nginx-exporter-stage
          relabel_configs:
    • action: keep
      regex: .*
      source_labels:
      • __meta_ec2_tag_Name
    • source_labels:
      • __meta_ec2_tag_Name
        target_label: instance_name
    • source_labels:
      • __meta_ec2_instance_id
        target_label: instance_id
    • source_labels:
      • __meta_ec2_tag_Environment
        target_label: Environment
        ###############
  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.

No UI errors.

  • Did you follow any online instructions? If so, what is the URL?

no.

Any tips in troubleshooting/finding a resolution?