Grafana Latency SLO Reporting >100%

  • What Grafana version and what operating system are you using?
    AWS managed Grafana v.9.4.7

  • What are you trying to achieve?
    Implementing SLOs for various services within the company

  • How are you trying to achieve it?
    AWS managed Grafana, Mimir, SLOth, AlertManager…

  • What happened?
    Grafana SLO dashboard reported a value greater than 100%

  • What did you expect to happen?
    We expect an SLI to max out at 100%, not higher as a service cannot outperform 100%.

  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.

  • Did you follow any online instructions? If so, what is the URL?

In essence, my company has seen two separate SLIs report a number greater than 100% which isn’t mathematically possible. We are unsure why, but wonder if it can be attributed to a delay in receiving metrics for any given reason, so the preceding buckets are missing the metrics. These metrics then get picked up in the next. Has anyone experienced this?

image (3)

1 Like

Interesting that your spike above 100% is the mirror of your drop just before… What is the query you use ?

Edit: I believe it would be related to Prometheus, not Grafana.

We’ve noticed instances where there isn’t a mirror of the spike. We also think it’s a Prometheus issue but spoke with Grafana about it and they are investigating. They believe it’s either a PromQL or dashboarding issue. Still waiting for them to give a formal explanation

1 Like

We are also seeing this issue; we have SLOs that are reporting greater than 100% SLI performance, which should be impossible. I tried setting the display to show only values between 0 (min) and 1 (max) but that had no effect.
I would love to hear the outcome of this as we are trying to push SLO adoption but issues like this make users dismissive of the entire solution.