Loki works like a charm for log analysis, however, when it’s about creating analytics dashboard for statistics, I face performance problems.
I may ask too much from it considering I’m using Ingester, Querier and Query Frontend on the same machine but I’m wondering if I couldn’t optimize queries.
My use case is pretty simple: I have several nodes serving API requests using nginx logs. The nginx logs are a bit tweaked to get requests details (including body, that’s the most storage intensive part) and geo location (from maxmind).
But the data volume is not that huge, consider this on a 24h time range:
logs size = 19 Gb
log lines = 21 million
But a query like this one, that seems optimized to me, takes a very long time and causes many of my panels to timeout:
sum by (geoip2_data_country_code) (count_over_time({job=~"nginx",node_name=~"node1|node2|node3|node4|node5|node6"} | json | geoip2_data_country_code != "" | __error__="" [$__interval]))
On 12 hours, it takes 20.2 s to respond.
Details below
Total request time
20.2 s
Number of queries
1
Total number rows
80291
Data source stats
Summary: bytes processed per second
527 MB/s
Summary: lines processed per second
552185
Summary: total bytes processed
10.2 GB
Summary: total lines processed
10654645
Summary: exec time
19.3 s
Ingester: total reached
64
Ingester: total chunks matched
17
Ingester: total batches
3057
Ingester: total lines sent
1559731
Ingester: head chunk bytes
0 B
Ingester: head chunk lines
0
Ingester: decompressed bytes
0 B
Ingester: decompressed lines
0
Ingester: compressed bytes
0 B
Ingester: total duplicates
0
Is there any way to reduce this time by optimizing query?
If no, what’s the best infra setup to get it to respond faster?
Are there log lines that do not contain geoip2_data_country_code ? Then putting |= "geoip2_data_country_code" before the json parser should help. Then only logs that actually contain geoip2_data_country_code would have to be parsed.
If all logs contain geoip2_data_country_code then another possibility would be to filter out the empty geoip2_data_country_code values with something like != “geoip2_data_country_code”: “”` (if that is how they are logged when empty).
The main thing would be to pass as few lines to the json parser as possible.
I don’t personally run Loki on a single instance, but in general if you are trying to optimize for one single Loki instance, then you should want to make sure that you write as few chunks to the storage as you reasonably can (you obviously don’t want to keep chunk in memory for too long either), and you want to make sure to split your query as little as possible as well, this usually gives you the biggest gain.
With Loki you parse logs at query time. Logs are just plain text strings before you add e.g. the | json parser in your query. There is nothing stopping you from using a line filter before parsing logs. Put the string inside backticks and you don’t even have to worry about escaping any characters.
Hi, we have similar experience running Loki 3.0.0 in on-prem k8s cluster. Its using an on-prem
S3 SSD storage. Below is the config we are using, its a default template for loki 3 from grafana, we just did some modification like ingress, configure s3 storage, …
We already spend lot of time tweaking loki 2.9 - but it takes a wizard to configure all the max_chunk_age , chunk_idle_period , chunk_target_size, query_ingesters_within and tens of other paramaters. There is a lot of complains about how loki documentation is bad and I can just confirm that.
I guess grafana is pushing their Cloud and Enterprise solutions and they stopped helping open-source community
What performance issue are you running into? Write or read?
I don’t think this is fair. Compared to a lot of other popular “open sourced” software (have you heard what happened to Terraform, for example?), I think Grafana is still being fair. If you think the documentation is subpar, perhaps that’s an opportunity to contribute?
Anyway, if you can submit a thread with more details on what your performance issue is I am sure there are community members who would be willing and able to help.
Note that the log line filter is put in backticks, and it filters the JSON-encoded field before decoding it. This should speed up the query, since Loki doesn’t need to JSON-decode unneeded log lines, which do not match the given filter.
P.S. If you are still unsatisfied with Loki querying performance, then try VictoriaLogs. It usually executes typical queries over logs at much faster speed. It also supports ingesting and querying of high-cardinality log fields such as WorkflowId, user_id, trace_id or ip.
Hi @valyala , thanks for the suggestion. Tried that but the results are that query is timeouting. In the meantime we upgraded to Loki 3.2.0 and enabled Bloom but nothing changed.
Not sure what is the use-case for Loki. We are currently using elasticsearch and everything you throw at it will will just process and queries are instant. Sure, it requires more processing power and more storage since it needs to label everything but then it just works.
Loki tell you to not label much because then you will have high cardinality…but then if you try to use any type of search in unlabelled logs - its just pain in the butt.