Xx_over_time leads to no-data -> no calculations/operations possible ~ Please help

Hi,
we are looking at a show stopper right now.
Loki: 2.4.1
Grafana: Grafana v8.1.6

Example:
we have 2 queries:

  1. We want to meassure all incoming events, that occured in a specific range like:
sum(count_over_time(
{app="controller"}
|="here is my incoming event"
[$__range]))
  1. We want to meassure all successful events, that occured in a specific range like:
sum(count_over_time(
{app="controller"}
|="here is my successful event"
[$__range]))

but what we really want is the difference here. if the difference is 0 the software works fine.

What happens:
if we have incomin requests matching Query 1 we get a value.
if we have no successful executions matching Query 2 we get nil (“no data”)

if we calculate query 2 minus query 1 we get “no data” as answer instead of the real difference

we need to observe the difference between these 2 queries and may trigger alerts.

Is there any solution to get this fixed or at least worked around?

Kind regards,
Carsten

1 Like

Hi there,
I found a workaround that is not beautiful but working:

if you have a query that might lead to “no data” you can use the “or” vecor operation multiplied by 0 eliminating the second query totally but returning a working vector, that represents 0.
For example solving my question from above would look the following:

Query 1 (incoming event) will look like:

(
 sum(count_over_time(
 {app="controller"} 
 |= "here is my incoming event"
 [$__range]))
 or
 sum(count_over_time({app="controller"} 
 [$__range])) * 0
)
#just taking all the logs the controller produces and multiplying them with 0

Query 2 (successful events) will look like:

(
 sum(count_over_time(
 {app="controller", container="controller"} 
 |= "here is my successful event"
 [$__range]))
 or
 sum(count_over_time({app="controller"} 
 [$__range])) * 0
#just taking all the logs the controller produces and multiplying them with 0
)
  1. resulting in the final query to get the difference:
#Query 2 minus Query 1
(
 sum(count_over_time(
 {app="controller", container="controller"} 
 |= "here is my successful event"
 [$__range]))
 or
 sum(count_over_time({app="controller"} 
 [$__range])) * 0
)

-

(
 sum(count_over_time(
 {app="controller"} 
 |= "here is my incoming event"
 [$__range]))
 or
 sum(count_over_time({app="controller"} 
 [$__range])) * 0
)

I hope it will help you because this way will not make your alerts unusable

1 Like

This ugly solution works for me.
Thank you!

I’d like to point out an optimization suggestion for the “dummy query”.

Instead of counting over [$__range], it might be more efficient to count over a fixed small amount of time.

Here is the “dummy query” that I use:

sum(count_over_time(
{level=~"error"} [5m])) * 0.0)

I also selected the “error” level because it’s where I have the smallest amount of logs.

Kind regards

this will not close until there is a solution for https://github.com/grafana/loki/issues/5074