How to calculate standard deviation and average without percentile?

kamilcuk · October 21, 2024, 2:31pm

Hi. My dataset has some abberations. To filter them out, I want to filter all values above and below some percentile.

How do I do that? In a normal database, I would do JOIN between temporary SELECTs that calculate the percentile and then filter them out of the second table. However in loki that is not possible.

I could execute two queries to get lower range percentile_over_time(0.1, ... | unwrap value [1d]) and upper range percentile_over_time(0.9, ... | unwrap value [1d]) and get a range for each set of labels.

But then, how do I avg_over_time(.... | <how to filter multiple thresholds for distinct labels> | unwrap value [1d])?

The only solution I found is to convert the thresholds from percentile_over_time to a big JSON array and then index it one by one. The data below has the indexes of labels and values extracted from percentile_over_time queries above.

avg_over_time(.... 
   | label_format data=`{"job1:"{"name1":{"category1":{"high":1.0,"low":2.0},"category2":..... etc. etc. }`
   | label_format high=`{{index (index (index (index (fromJson .data) .job) .category) .name) "high"}}`
   | label_format low=`{{index (index (index (index (fromJson .data) .job) .category) .name) "low"}}`
   | __error__==""
   | value>high | value<low
   | drop high, low, data
   | unwrap value [1d]
)

However, that is not simple to construct. Is there a simple way?

Ideally I would want to write a nested query to match the difference on labels:

avg_over_time(.... 
   | value>percentile_over_time(0.1, .... | unwrap value [1d])
   | value<percentile_over_time(0.9, .... | unwrap value [1d])
   | unwrap value [1d]
)

Thank you.

tonyswumac · October 21, 2024, 8:48pm

I don’t actually think this is possible by just Loki’s LogQL alone, primarily because you can’t nest queries and you will need at least 3 queries to get what you want (two to calculate upper and lower values based on desired percentile, then another query to filter by those values). Hopefully someone more knowledge can prove me wrong.

That said, I could think of a couple of alternatives:

Write a script that does exactly that.
Use a recording rule and forward your query metrics to prometheus or Mimir, and use promql for this. I am not an expert with prometheus, and there is a possibility that this is not possible with promql either, but at least from my limited exposure promql has a lot more features and can at least nest queries.

Topic		Replies	Views
Create a graph depicting the percent of values below a threshold over time Time Series Panel	3	1245	September 2, 2022
How do we get the standard deviation over time without labels? Grafana Loki	13	4900	March 27, 2024
How to use Arithmetic Binary Operators for raw values in Loki's Log record? Grafana Loki	3	1011	February 5, 2023
Calculating percentiles	6	24060	October 13, 2017
How to filter out values based on standard deviation? (Elastic Search) Elasticsearch	0	609	February 2, 2023

How to calculate standard deviation and average without percentile?

Related topics