How to calculate standard deviations when loki is slow?

I want to calculate standard deviations of some csv data stored in loki:

logcli instant-query '
  stddev_over_time(
       {job=....}
       | regexp `^(?P<hhm>09:[345]|1[0-5])[^,]*,MinMaxAvg,(?P<category>[^,]{4})[^,]*,(?P<name>[^,]+),(?P<count>[^,]+)'
       | unwrap count [4d] ) by (job, category, name)
 )
'

However, for much data it just fails with EOF:

....+unwrap+avg+%5B4d%5D+%29+by+%28name%2C+hour%29%0A++++++++&time=1726569532005034620": EOF
2024/09/17 06:39:35 Query failed: run out of attempts while querying the server

How can I “parallelize” getting standard deviation over multiple days?

I am bad at statistics, but as I understand I have to get variations and counts for each days:

count=() variation=()
for i in $(range 4); do
    count[i]=$(logcli instant-query "count_over_time( $QUERY offset ${i}d")
    variation[i]=$(logcli instant-query "stdvar_over_time( $QUERY offset ${i} d")
done
declare -n standard_deviation= ?? how to calculate this ??

Thank you

How heavy are your logs?

  1. In order to parallelize query in Loki you’d need to deploy either with simple scalable mode or distributed mode, see Loki deployment modes | Grafana Loki documentation
  2. Then you’ll want to configure query frontend, and queriers to connect to query frontend, see Query frontend example | Grafana Loki documentation

After that you can scale the queriers up and down depending on your needs.

1 Like

Hi @tonyswumac hope you are well!

How heavy are your logs?

If I query all the logs line by line, it’s about 30MB of logs.

There are a lot of logs, the regex and there is also additional filtering, it is filtering from gigabytes of logs logs.

In order to parallelize query

I do not care about speed. I need to execute the query at all. Right now loki just closes the connection Query failed: run out of attempts while querying the server. I tried increasing query_timeout: 5m, but it is closing the connection after around 1 minute. If there anything else I can increase any other limits_config in loki?

scale the queriers up and down depending on your needs.

In my company, I have one machine for loki, and I will have only one machine. Because how Loki is constructed and how you recommend S3, from time to time I am contemplating running a local minio instance just so that loki works better with S3.

Right now, Loki uses a local file system, and it can use as much CPU as it wants to. I am running Memcached instances to use more memory - on normal operation Loki wants to use only like 2GB from 120GB on the machine, where memcached-chunk uses 20GB (still around 90GB free).

But CPU is not an issue - I/O is an issue and is the slowest, and I will not move from it, and with S3 I/O will be the same problem using the same disc.

Performance, however, is not an issue. I want the query to execute at all, I can wait 10 minutes for it.

This is not a use case for us, so I am kinda guessing a bit as well. Here are some things you can try:

  1. Increase all timeout related settings, such as:
http_config.idle_conn_timeout
limits_config.query_timeout
server.http_server_read_timeout
server.http_server_read_header_timeout
server.http_server_idle_timeout

You can hit the /config endpoint and print out all configuration options, and search for timeout.

  1. If you have a lot of CPUs, try tweaking parallelism so you have more queries running at the same time. For example:
limits_config:
  max_concurrent: 20
  split_queries_by_interval: 2h

Try to give it more concurrency, while increasing the time the query is split by, and see if you can reach a happy number.

1 Like