Calculate the quantile with sparse metrics

  • What Grafana version and what operating system are you using?
    docker.io/grafana/grafana:10.4.0 on k3s

  • What are you trying to achieve?
    I am trying to calculate the 0.95 quantile for a bits per second gauge on prometheus per interface. The problem is that due to the many AS that exist, we only save the top 1000 flows. This results in flows beneath a specific threshold in not having datapoints at all times. There are only datapoints when there is actually traffic.

  • How are you trying to achieve it?
    I am attempting to calculate the 0.95 quantile of bits per second for each interface by using the quantile_over_time function in Prometheus and summing the results by the inputifindex label. The query is designed to aggregate the quantiles for each interface over the specified time range.

  • What happened?
    The calculated .95 quantile is only calculated in respect to the given datapoints, the time in which no datapoints are recorded (in which the bandwidth is zero) is ignored, resulting in a false value.

  • What did you expect to happen?
    I expected the missing datapoints to be filled with zero values, resulting in a correct calculation of the quantile.

  • Can you copy/paste the configuration(s) that you are having problems with?

sum by (inputifindex) (
  quantile_over_time(sflow_asn_bps{direction="ingress", src="AS20940 Akamai International B.V."})[$__range}}
)

The problem with that is that a new timeseries with 0s is produced, but it is not combined with the existing, multiple, timeseries. If I sum them not only each timeseries will be merged with the 0 timeseries, but all will be merged, resulting in a completely wrong timeseries.

Thank you for your assistance!