I have a tricky situation where I’m locked out of the box that has Grafana, so I cannot update it nor see the grafana logs, so I need to get creative until they give me the devops time.
Grafana version is v6.2.5.
I only have one alerting channel (going to PagerDuty). I know it works because some alerts trigger every day. Many other alerts are triggering according to Grafana’s TestRule and StateHistory (screenshots attached), but PagerDuty doesn’t find out.
We don’t have any silencing or exclusion rules on PagerDuty, so we should see all alerts. Even if silenced, we would still see them on the PagerDuty dashboard.
I’ve refreshed the Integration Key between Grafana and PagerDuty. It hasn’t solved the issue.
I’ve found one alert that wasn’t receiving data (metric didn’t exist) and it was defaulting to “Last state”, so it was stuck, but that was only a red herring; most alerts that are not triggering PagerDuty are clearly showing as triggered.
If all alerts were not working, I’d have a clue, but we see alerts coming from Grafana every day.
Does someone have any clue if there was something wrong that was fixed after v6.2.5? I saw this one for example: PagerDuty alerting fails when summary is greater than 1024 characters · Issue #18727 · grafana/grafana · GitHub but it’s not this.
The screenshots below show an example alert triggering on Grafana that doesn’t reach PagerDuty.