How to Avoid Repeated Alerts for Noisy Containers in Grafana?

Hello Grafana Community,

I have set up alerts for monitoring containers with the following conditions:

[DIE, OOM, START]

  • Target: Quiet containers
  • Alert: Final event of the container
  • Frequency: Every 2 minutes
  • Condition: Based on the node’s final state within the last 2 minutes + 3 seconds
  • Trigger: Die, Start, OOM less than 2 times

[RE-STARTING]

  • Target: Noisy containers
  • Alert: Number of DIE ↔ START repetitions
  • Frequency: Every 1 hour
  • Condition: Based on the node’s final state within the last 1 hour + 10 seconds
  • Trigger: Die, Start, OOM more than 6 times

The problem I’m facing is that important alerts (such as container crashes or OOM errors) are not getting through, while I keep receiving alerts from containers that repeatedly restart or are noisy.

The first alert is expected, but I would like to ignore further alerts for the same “noisy” containers after a certain period. How can I suppress further alerts for noisy containers after they have triggered an alert within a specific interval (e.g., within 1 hour) while ensuring that important alerts continue to come through?

My current configurations are as follows:

  • [DIE, OOM, START]
    • Pending: 2 minutes
    • Evaluation: 10 seconds
    • Time Range: 2 minutes + 3 seconds
    • Repeat Interval: 4 hours
  • [RE-STARTING]
    • Pending: 1 hour
    • Evaluation: 10 seconds
    • Time Range: 1 hour + 10 seconds
    • Repeat Interval: 4 hours

I’ve already explored the Grafana Alert Notification Policy Docs, but I would love some guidance on how to adjust these settings to achieve the desired behavior.

My query is like below
topk1(1, last_over_time({container = “A-*”}
| json
| status =~ “DIE|START|OOM”
| unwrap time [120s])
by (name, ip, text, status))
by (name, ip, text)

Any help or suggestions would be greatly appreciated!

I looked at the query and it might be that it’ generating a large number of alerts because high cardinality of alert instances.

The alert query defines how many alerts (alert instances) get evaluated and fired.

... by (name, ip, text)

Each unique combination of label values in your by clause creates a separate alert instance. Each unique combination of name, ip, and text creates an alert instance.

If the same container emits multiple distinct log messages (i.e., different text), you’ll get multiple alert instances for the same container, one for each text.

by (name, ip)

This defines one alert instance per container and IP, regardless of how many different messages it emits.

Then, you can combine this with notification grouping and timing options to control how often you get notified.