How do we get the standard deviation over time without labels?

We have logs getting through promtail to loki There are no labels. We would like to compute the standard deviation to be used for alerting. The queries queries are as follows:

avg(stddev_over_time({job=“job1”}[1m]) != 0)

The above did not work because we had to unwrap labels and we did not have labels.
We tried the following and we got a line which is zero:

stddev(rate({job=“job1”} |= `` [$__interval]))

Note that avg(rate({job=“job1”} |= `` [$__interval])) is returning non-zero values.

Examples of log would be helpful.

[TRACE] 2023-03-26 20:21:01.512 (DefaultId.java:84) -sftp- {“location”:”remote”,”which”:”response”,”id”:”6791723644”3412,“HTTP/1.1”,“status”:200,“headers”:{“Deny”:[“bytes”],“Cache”}
[DEBUG] 2023-03-26 20:21:01.512 (Server.java:1131) -sftp- - Success 200
[INFO ] 2023-03-26 20:21:01.511 (Middleware.java:454) 17981273 -sftp- Request Time: 22 ms
[ERROR ] 2023-03-26 20:21:01.511 (function java:56) 798164312 -sftp- Internal error occurred
[DEBUG] 2023-03-26 20:21:01.511 (Modification.java:149) 109283743 -sftp- Response Time: 50 ms

What are you trying to calculate stddev on? Request time? Response time?

We are getting all the lines with errors. Then we are trying to get all the time where the count is greater than 2 standard deviation and send an alert. We will also try z-score as threshold for alert but haven’t reached that yet.

I am not sure if it’s possible to do this with stddev. Perhaps try the rate function?

why does stddev return zero?

This one also is returning zero with stddev:
https://play.grafana.org/d/T512JVH7z/loki-nginx-service-mesh-json-version?orgId=1&editPanel=9

Query:
stddev(rate({$label_name=~“$label_value”} |= `` [$__interval]))

The link you provided is probably not the same query you were using anymore.

For your second query, you are searching for empty log line (|= `` ) Are you sure it actually returns results?

Yes, avg returns a non-zero graph. stddev returns a flat line at zero.
Could you show me stddev without labels in the demo link below where stddev is not zero:
https://play.grafana.org/d/T512JVH7z/loki-nginx-service-mesh-json-version?orgId=1&editPanel=9

I think you may be misreading on what the function stddev and stddev_over_time do, and how they are different.

  • stddev_over_time(unwrapped-range): the population standard deviation of the values in the specified interval.
  • stddev: Calculate the population standard deviation over labels

So when you do something like stddev(rate({$label_name=~“$label_value”} |= `` [$__interval])), you are calculating the stddev over the labels produced by {$label_name=~“$label_value”} |= `` , and if there is only one set of label it would of course be 0.

Judging from your original post, you are looking for a sudden spike of error logs (please correct me if I am wrong). Naturally you’d probably want to do stddev_over_time( count_over_time ( {Select for error logs} [range]) [range]), but I don’t think you can chain two unwrapped-range functions together in LogQL, hence I recommended trying out rate so you can calculate rate of change for the number of error logs, and see if that would be sufficient.

Ok, thanks. The issue we are having with rate only we are getting too many false positives. Rhe errors are frequent so we need a way to filter those with low errors and those with large errors out. We don’t know the threshold exactly. So, we thought about z-score and standard deviation to filter out those errors outside those “bands”.

Do you have any ideas how to resolve the above?

I don’t know if that’s possible without doing something more elaborate. Maybe someone with more experiences can comment.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.