Alerting silences removed unexpectedly

Hi, I had a question on Silences with external Alertmanagers.

We run Grafana in kube-prometheus-stack. We are evaluating a few dozen alert rules with a goal for a unified monitoring and observability stack. We are using the Prometheus Alertmanager as an External Alertmanager for the Grafana-managed rules.

I was testing two Silences configured for the external Alertmanager yesterday, and then this morning, I see they are gone. One of the silences was against an active alert instance that was suppressing notifications. Our Prometheus and Alertmanager pods were recreated by Karpenter last night, and a few mins after, I received an alert notification. This morning, I log into Grafana UI and the Alerting Silences page is empty with no history or silences.

I had configured the Silences to run through March 31, so they should still be there.

The Grafana logs show some errors connecting to the Prometheus pods, but other than that, I do not see anything around the timestamp in logs when the pods recreated and the alert instance that was supposed to be silenced, notified me.

A couple other things… (1) our Grafana and Prometheus pods have no persistent storage or external database, (2) the Grafana pod was not recreated, it has 9d of uptime, while Prom pods are 8h.

Any idea why the Silence configurations disappeared?

This was because our Prometheus Alertmanager pod does not have persistent storage. With an external AM, the config state and history for the Silence are on the external AM.

1 Like