Grafana is throwing false-positive alerts "InstanceDown"

jajamaru · March 23, 2023, 5:34pm

Hi All,

We have this monitoring alert, which is being thrown every 3 hours:
[FIRING:2] InstanceDown (: opsgenie)
Alert: Instance :9113 - web-xyz down
Description: :9113 - web-xyz of job nginx-exporter-stage has been down for more than 1 minutes.
Details:
alertname: InstanceDown
Environment: stage
instance: :9113
instance_id: i-0a2s3d4fg56yhj78k9
instance_name: web-xyz
job: nginx-exporter-stage
prometheus_host: .ec2.internal
severity: opsgenie

Upon checking (querying) in Grafana, result shows it’s down for a long time now.

up{job=“node-exporters”} == 1

The thing is, the instance never went down.
It’s been up & running all the time, but Grafana (Prometheus) shows differently.
The port 9113 is never really open… (9100 is open, Prometheus uses this, right?)

What Grafana version and what operating system are you using?

Grafana v8.0.3 (cae5c5e46b), Ubuntu

What are you trying to achieve?

figure out the fix for this.

How are you trying to achieve it?

I’m checking if there’s somethings wrong with the prometheus.yml

What happened?

Grafana/Prometheus is throwing false alerts frequently.

What did you expect to happen?

we expect to only received this alert when the server/s is actually down.

Can you copy/paste the configuration(s) that you are having problems with?

snippet:
###########

ec2_sd_configs:
- filters:
  - name: tag:Role
    values:
    - web-staging-ec2
      port: 9113
      region: us-east-1
- filters:
  - name: tag:Role
    values:
    - web-prod
      port: 9113
      region: us-west-2
- filters:
  - name: tag:Role
    values:
    - web-prod
      port: 9113
      region: ca-central-1
- filters:
  - name: tag:Role
    values:
    - web-prod
      port: 9113
      region: ap-southeast-2
- filters:
  - name: tag:Role
    values:
    - web-prod
      port: 9113
      region: eu-west-1
      job_name: nginx-exporter-stage
      relabel_configs:
- action: keep
  regex: .*
  source_labels:
  - __meta_ec2_tag_Name
- source_labels:
  - __meta_ec2_tag_Name
    target_label: instance_name
- source_labels:
  - __meta_ec2_instance_id
    target_label: instance_id
- source_labels:
  - __meta_ec2_tag_Environment
    target_label: Environment
    ###############

Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.

No UI errors.

Did you follow any online instructions? If so, what is the URL?

no.

Any tips in troubleshooting/finding a resolution?

anasgharsa2 · July 10, 2025, 1:27pm

796 views and 0 reply. you are really champs

jangaraj · July 10, 2025, 1:31pm

This is a community forum = someone CAN respond. You have still option to pay for the support if you expect that someone MUST respond.

anasgharsa2 · July 10, 2025, 1:34pm

at least a reply that we can go ahead and not wait for nothing. yeah right, with the super champs i pay for nothing

Topic		Replies	Views
False positives (instance down) alert Alerting alerting	2	26	July 10, 2025
Grafana is showing Website is down while my website is up Prometheus alerting , query-help	3	2073	December 16, 2023
Grafana-agent alerting on system down Grafana alerting , grafana	0	363	March 9, 2024
Grafana cpu usage of specific instance alert Time Series Panel alerting	6	2449	September 21, 2023
Use grafana to alert when another grafana is down? Configuration	1	475	February 22, 2019

Grafana is throwing false-positive alerts "InstanceDown"

Related topics