Error log alerting

Hi, let me start by saying I’m a bit of a Grafana noob.
I’m currently working on a system for analyzing errors for a product.
I want to create an alerting rule that sends out an alert when a “critical” error is detected.

Now the problem!

If the product has an error, it sends data to the error log in the cloud about every 20 seconds. If it doesn’t have any errors, the product doesn’t send anything to the log.

How do I make it so that the alert only triggers once per error? Because the product keeps sending the error data to the database every 20 seconds as long as it has the error.

What does the error data look like?

Suppose you have an error which lasts for two minutes (and therefore sends 6
messages), then there’s a gap of 5 minutes, a different type of error which
lasts one minute (and sends three messages), then another gap, and then the
second error comes back again…

  • are the first 6 messages identical to each other?

  • are the next three messages the same as each other but identifiably different
    from the first six?

  • are the final messages identifiably different from the previous three?

Can there be two errors occurring at the same time, therefore sending
interleaved messages?

Antony.

1 Like

IMHO the system is pretty bad
the product sends this in its error message:

Time
product_id
error_code
error_type
error_description
error_severity

  • are the first 6 messages identical to each other?:

If the product has an error for 6 minutes everything but the time should be the same

  • are the next three messages the same as each other but identifiably different
    from the first six?:

if its a different error it will have a different error_code.

  • are the final messages identifiably different from the previous three?:

if its the same error_code the only difference in the message will be the time. so i would probable have to do some voodoo to see if there was a gap longer then 20 seconds?