Hi,
we are looking at a show stopper right now.
Loki: 2.4.1
Grafana: Grafana v8.1.6
Example:
we have 2 queries:
- We want to meassure all incoming events, that occured in a specific range like:
sum(count_over_time(
{app="controller"}
|="here is my incoming event"
[$__range]))
- We want to meassure all successful events, that occured in a specific range like:
sum(count_over_time(
{app="controller"}
|="here is my successful event"
[$__range]))
but what we really want is the difference here. if the difference is 0 the software works fine.
What happens:
if we have incomin requests matching Query 1 we get a value.
if we have no successful executions matching Query 2 we get nil (“no data”)
if we calculate query 2 minus query 1 we get “no data” as answer instead of the real difference
we need to observe the difference between these 2 queries and may trigger alerts.
Is there any solution to get this fixed or at least worked around?
Kind regards,
Carsten
1 Like
Hi there,
I found a workaround that is not beautiful but working:
if you have a query that might lead to “no data” you can use the “or” vecor operation multiplied by 0 eliminating the second query totally but returning a working vector, that represents 0.
For example solving my question from above would look the following:
Query 1 (incoming event) will look like:
(
sum(count_over_time(
{app="controller"}
|= "here is my incoming event"
[$__range]))
or
sum(count_over_time({app="controller"}
[$__range])) * 0
)
#just taking all the logs the controller produces and multiplying them with 0
Query 2 (successful events) will look like:
(
sum(count_over_time(
{app="controller", container="controller"}
|= "here is my successful event"
[$__range]))
or
sum(count_over_time({app="controller"}
[$__range])) * 0
#just taking all the logs the controller produces and multiplying them with 0
)
- resulting in the final query to get the difference:
#Query 2 minus Query 1
(
sum(count_over_time(
{app="controller", container="controller"}
|= "here is my successful event"
[$__range]))
or
sum(count_over_time({app="controller"}
[$__range])) * 0
)
-
(
sum(count_over_time(
{app="controller"}
|= "here is my incoming event"
[$__range]))
or
sum(count_over_time({app="controller"}
[$__range])) * 0
)
I hope it will help you because this way will not make your alerts unusable
1 Like
This ugly solution works for me.
Thank you!
I’d like to point out an optimization suggestion for the “dummy query”.
Instead of counting over [$__range]
, it might be more efficient to count over a fixed small amount of time.
Here is the “dummy query” that I use:
sum(count_over_time(
{level=~"error"} [5m])) * 0.0)
I also selected the “error” level because it’s where I have the smallest amount of logs.
Kind regards
this will not close until there is a solution for https://github.com/grafana/loki/issues/5074