This kind of query can get expensive, as under the hood this is evaluated the same as ho Prometheus range_queries are evaluated.
Basically this entire query:
topk(5, sum by (q_domain) (count_over_time({q_hostname=“dns-cache01”,q_message=“CLIENT_QUERY”,q_type=“A”} | logfmt | q_domain =~ “." | q_src_ip =~ ".” [5m])))
Will be performed starting at the start time of the query, then again at start + step, then again at start + step + step, etc all the way until end time. This blog post i think helps explain this as well.
The reason topk returns more results is because of the nature of how it works: topk
is evaluated at each step, this means you can end up with more than 5 results as the top 5
results you are looking for are being evaluated at every step and may be different at each step. (if each step has a different list of top5 then the output series will have way more than 5 results) Here is another blog post that talks about this, however the work around in that result would be a little tedious for Loki as I’m not sure off my head if query_result
works on a Loki datasource, so you would have to configure Loki as a prometheus datasource in Grafana (which is possible, Loki presents a prometheus compatible API, you just need to add a \loki
suffix in the URL: http://loki-url/loki
)
--limit
is ignored for metric queries it only applies to log results which is why it has no affect here.
So how can you make this faster? You could explore parallelization with the query-frontend and more queriers.
You could also change the step
in your query to a bigger value, and match the range [5m]
to the step. Such that Loki does fewer iterations of that query.
Not sure if your desired visualization supports this, but you could consider doing this as an instant query:
time logcli instant-query ‘topk(5, sum by (q_domain) (count_over_time({q_hostname=“dns-cache01”,q_message=“CLIENT_QUERY”,q_type=“A”} | logfmt | q_domain =~ “." | q_src_ip =~ ".” [50m])))’ -q --stats
Note the other difference here is the range was updated from [5m]
to [50m]
, this will run one single query for the last 50m and give you a single result which is useful for display in a table.