I try to implement an SLI according to the Google SRE definition:
SLI = good events / valid events
Let’s say I have my latency metric available where I can print a graph showing the latency of each request.
Now I want to know the percentage of all calls over a defined period of time that were within a certain threshold.
For this I need the total number of calls (data points) as well as the number of calls (data points) that stayed within my threshold
where scalar is my threshold (i.e. 100ms),
range-vector my latency-metric over a defined time period (i.e. 1h)
and count-good counts only the number of calls that stay within my threshold, whereas count will count all calls within my time frame
how can I achieve this with a Prometheus data source?