Log analytics performance: is my LogQL request optimised?

bLd759 · August 31, 2023, 5:32pm

Loki works like a charm for log analysis, however, when it’s about creating analytics dashboard for statistics, I face performance problems.
I may ask too much from it considering I’m using Ingester, Querier and Query Frontend on the same machine but I’m wondering if I couldn’t optimize queries.

My use case is pretty simple: I have several nodes serving API requests using nginx logs. The nginx logs are a bit tweaked to get requests details (including body, that’s the most storage intensive part) and geo location (from maxmind).

But the data volume is not that huge, consider this on a 24h time range:

logs size = 19 Gb
log lines = 21 million

But a query like this one, that seems optimized to me, takes a very long time and causes many of my panels to timeout:

sum by (geoip2_data_country_code) (count_over_time({job=~"nginx",node_name=~"node1|node2|node3|node4|node5|node6"} | json | geoip2_data_country_code != "" | __error__="" [$__interval]))

On 12 hours, it takes 20.2 s to respond.
Details below

Total request time	20.2 s
Number of queries	1
Total number rows	80291
Data source stats

Summary: bytes processed per second	527 MB/s
Summary: lines processed per second	552185
Summary: total bytes processed	10.2 GB
Summary: total lines processed	10654645
Summary: exec time	19.3 s
Ingester: total reached	64
Ingester: total chunks matched	17
Ingester: total batches	3057
Ingester: total lines sent	1559731
Ingester: head chunk bytes	0 B
Ingester: head chunk lines	0
Ingester: decompressed bytes	0 B
Ingester: decompressed lines	0
Ingester: compressed bytes	0 B
Ingester: total duplicates	0

Is there any way to reduce this time by optimizing query?
If no, what’s the best infra setup to get it to respond faster?

Thanks for your help!

b0b · September 1, 2023, 2:07pm

Are there log lines that do not contain geoip2_data_country_code ? Then putting |= "geoip2_data_country_code" before the json parser should help. Then only logs that actually contain geoip2_data_country_code would have to be parsed.

If all logs contain geoip2_data_country_code then another possibility would be to filter out the empty geoip2_data_country_code values with something like != “geoip2_data_country_code”: “”` (if that is how they are logged when empty).

The main thing would be to pass as few lines to the json parser as possible.

tonyswumac · September 5, 2023, 8:35pm

What does your infrastructure look like from resource allocation? How many reader / queriers do you have and how much resource are they given?

bLd759 · September 13, 2023, 11:16am

Thanks for the answers.

@b0b the geoloc data is in json so I can’t filter it unfortunately.
But it doesn’t change much, most of lines have it filled actually.

@tonyswumac I have simply one reader and querier on the same server. Resource allocation is not limited, I’m using binaries.

tonyswumac · September 13, 2023, 3:54pm

I don’t personally run Loki on a single instance, but in general if you are trying to optimize for one single Loki instance, then you should want to make sure that you write as few chunks to the storage as you reasonably can (you obviously don’t want to keep chunk in memory for too long either), and you want to make sure to split your query as little as possible as well, this usually gives you the biggest gain.

Please see my comments on another thread here: Loki full CPU for a long time and need restart to work again - #6 by tonyswumac. It has some general recommendations there.

b0b · September 14, 2023, 8:30am

With Loki you parse logs at query time. Logs are just plain text strings before you add e.g. the | json parser in your query. There is nothing stopping you from using a line filter before parsing logs. Put the string inside backticks and you don’t even have to worry about escaping any characters.

deb0ro · May 2, 2024, 7:26am

Hi, we have similar experience running Loki 3.0.0 in on-prem k8s cluster. Its using an on-prem
S3 SSD storage. Below is the config we are using, its a default template for loki 3 from grafana, we just did some modification like ingress, configure s3 storage, …

loki:
    auth_enabled: false
    schemaConfig:
      configs:
        - from: 2024-04-17
          store: tsdb
          object_store: s3
          schema: v13
          index:
            prefix: loki_index_
            period: 24h
    ingester:
      chunk_encoding: snappy
    tracing:
      enabled: true
    querier:
      max_concurrent: 4
    commonConfig:
      replication_factor: 3
    storage:
      type: s3
      bucketNames:
        chunks: k8s-dev-01-loki-chunks
        ruler: k8s-dev-01-loki-ruler
        admin: k8s-dev-01-loki-admin
      s3:
        endpoint: 1.2.3.4
        accessKeyId: ${ACCESS_KEY}
        secretAccessKey: ${SECRET_KEY}
        insecure: false
        s3ForcePathStyle: true
        http_config:
          insecure_skip_verify: true

  deploymentMode: SimpleScalable

  write:
    replicas: 3
    persistence:
      size: 10Gi
      storageClass: thin-disk
    extraArgs:
    - -config.expand-env=true
    extraEnvFrom:
      - secretRef:
          name: loki-s3-creds
  read:
    replicas: 3
    persistence:
      size: 10Gi
      storageClass: thin-disk
    extraArgs:
    - -config.expand-env=true
    extraEnvFrom:
      - secretRef:
          name: loki-s3-creds
  backend:
    replicas: 3
    persistence:
      size: 10Gi
      storageClass: thin-disk
    extraArgs:
    - -config.expand-env=true
    extraEnvFrom:
      - secretRef:
          name: loki-s3-creds

  gateway:
    enabled: true
    replicas: 3
    nginxConfig:
      enableIPv6: false
  ingress:
    enabled: true

  minio:
    enabled: false

  singleBinary:
    replicas: 0
  ingester:
    replicas: 0
  querier:
    replicas: 0
  queryFrontend:
    replicas: 0
  queryScheduler:
    replicas: 0
  distributor:
    replicas: 0
  compactor:
    replicas: 0
  indexGateway:
    replicas: 0
  bloomCompactor:
    replicas: 0
  bloomGateway:
    replicas: 0

We are getting horible performance out of it, it take more the 20s to render following query:

{ApplicationName="appname", env="prod"} | json | WorkflowId="41b8f"

We already spend lot of time tweaking loki 2.9 - but it takes a wizard to configure all the max_chunk_age , chunk_idle_period , chunk_target_size, query_ingesters_within and tens of other paramaters. There is a lot of complains about how loki documentation is bad and I can just confirm that.

I guess grafana is pushing their Cloud and Enterprise solutions and they stopped helping open-source community

tonyswumac · May 2, 2024, 11:45pm

What performance issue are you running into? Write or read?

I don’t think this is fair. Compared to a lot of other popular “open sourced” software (have you heard what happened to Terraform, for example?), I think Grafana is still being fair. If you think the documentation is subpar, perhaps that’s an opportunity to contribute?

Anyway, if you can submit a thread with more details on what your performance issue is I am sure there are community members who would be willing and able to help.

valyala · June 30, 2024, 5:45pm

@deb0ro , try rewriting your query to the following one, e.g. put the log line filter pipe in front of JSON pipe:

{ApplicationName="appname", env="prod"} |= `"WorkflowId":"41b8f"` | json

Note that the log line filter is put in backticks, and it filters the JSON-encoded field before decoding it. This should speed up the query, since Loki doesn’t need to JSON-decode unneeded log lines, which do not match the given filter.

P.S. If you are still unsatisfied with Loki querying performance, then try VictoriaLogs. It usually executes typical queries over logs at much faster speed. It also supports ingesting and querying of high-cardinality log fields such as WorkflowId, user_id, trace_id or ip.

deb0ro · October 23, 2024, 12:48pm

Hi @valyala , thanks for the suggestion. Tried that but the results are that query is timeouting. In the meantime we upgraded to Loki 3.2.0 and enabled Bloom but nothing changed.

Not sure what is the use-case for Loki. We are currently using elasticsearch and everything you throw at it will will just process and queries are instant. Sure, it requires more processing power and more storage since it needs to label everything but then it just works.

Loki tell you to not label much because then you will have high cardinality…but then if you try to use any type of search in unlabelled logs - its just pain in the butt.

Topic		Replies	Views
Loki times out when querying larger data sets Grafana Loki loki	8	2406	March 22, 2024
Very slow query performance Loki 2.4.1 Grafana Loki	1	1128	January 27, 2023
Loki Querier not able to query Ingester? Grafana Loki	2	1369	May 19, 2022
How to properly scale Loki queries with lots of data Grafana Loki loki	2	2474	November 24, 2022
Grafana/Loki slow to load more than 2k logs Grafana Loki loki	1	217	June 7, 2024

Log analytics performance: is my LogQL request optimised?

Related topics