How to calculate the number of requests in a time period using PromQL

Hi,
I’d like to put in my two cent’s worth :smile:

1. Gauge vs counter

The main difference between gauges and counters is that gauge can decrease, counters can only increase or be reset to 0. Using counters for numbers of HTTP requests is highly recommended. Why not gauges though? Imagine a situation, where your data is being scraped every minute. You might have multiple values set to the gauge that you won’t ever see. Counters will only grow so it doesn’t matter if you get the measurement in 30th second or 45th second, while with gauge your measurement could spike and go low many many times. HOWEVER, here you’re using the gauge as a counter, which I would strongly recommend rewriting to counters.

2. Rate function

As David pointed out, when you have cumulative values with counters, you can only see the increase of the counter. That’s where rate function comes in handy. It can calculate increase of the counter in given time frame (I’ve seen lookbehind window somewhere and I took quite a liking to that name :smile:). So a query like this:

coleta_online_request_count{product_id=~'$product_id', external_bases_name=~'$base_name'}[5m]

Would say “take the coleta_online_request_count series that also contains specified labels and give me all the points of those metrics in last 5m” (see the screen below):

Now, we can see that rate (which is (last value in time - first value in time) / time between those points) would be one, since (241 - 1) / (240) == 1 (the denominator is four minutes, since that’s the time between the last and first point). If you want the number of requests, not the rate, you can use increase function, which acts like rate but without dividing by the time.

Both rate and increase functions should only be used with counters (or in your case - gauges behaving like counters), since they both assume that if the value is lower than the last point, it means that the counter did reset and they “shift” the reseted data points up. With gauges it does some… sheneningans (I’ve seen that once and it wasn’t pretty :smile:). Also a reason why we use sum(rate(... and not rate(sum(... since sum can go up and down (e.g. pod restarts).

3. How to connect the query to Grafana’s time picker?

Sadly, I’m not sure. In clean prometheus datasource, you can use $__range built-in variable which will resolve to duration of the time picker, so your query could look like this:

sum(rate(coleta_online_request_count{product_id=~'$product_id', external_bases_name=~'$base_name'}[$__range]))

I’m not sure if it will resolve in your datasource though (I think it should but…).

4. Why do you have 0 values in the result?

I see one reason for that - because you didn’t have any requests in that time :smile:. Since now you know how the rate function works, you can see at the second provided screen (with just the sum) and notice that all the values are 12. So (12-12) / 240 == 0, that’s why. For tests you could extend the time period to look for the data where you had actual increase in values.

I hope this helps! A little disclaimer - I don’t know how much you know, so I opted for the full explanation from base-up, so if you already know some / all the stuff, you can ignore those parts :smile:

2 Likes