CPU/RAM overload due to query with regexp and metadata

Hello, dears!
Please, help me to understand the strange behaviour of Loki.

Describe the bug
Small query with construction <regexp + metadata + regexp> in Grafana Explore overloads RAM and CPU await
{application=“app”} !~ “(?i)info” | filename = “E:/app/app.log” |~ “(?i)err”

RAM:
Image

CPU:
Image

But, occured that constructions below:

  1. {application=“app”} !~ “(?i)info”
  2. {application=“app”} !~ “(?i)info” |~ “(?i)err”
  3. {application=“app”} !~ “(?i)info” |~ “(?i)err” | filename = “E:/app/app.log”
  4. {application=“app”} | filename = “E:/app/app.log” !~ “(?i)info” |~ “(?i)err”

works properly - without any overloads by RAM/CPU

Only construction <regexp + metadata + regexp> triggers this problem.

In addition, in scanario below didnt stops the overload:
Cancellation of query (by timeout or manually) in Grafana, then OOM kill of Loki container - its restarting with and overloading again

Helps only manual restart of Loki container

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki
  2. Started Grafana
  3. Started S3 - MinIO
  4. Query: {application=“app”} !~ “(?i)info” | filename = “E:/app/app.log” |~ “(?i)err”

Expected behavior
Overload by RAM/CPU await

Environment:

  1. Monolithic Loki deployment (in cluster of 3 nodes: 8 CPU/12G RAM, 4G Swap)
  2. Running in container (every Loki container limited by 6 CPU/10G RAM, 4G Swap)
  3. Logs stores in MinIO S3

You’ll usually want to filter the logs first (with both labels and structured metadata), before you do any processing, so that you don’t process data that you are already not interested in.

In your analogy, {application="app"} | filename = "E:/app/app.log" !~ "(?i)info" |~ "(?i)err" is probably best in my opinion, assuming filename is a structured metadata.

Yeah, i am agreed about sequence of query, but this query sequence was created automatically in Grafana Explore (builder mode) from low experienced engineer, who just wanted to see part of logs. As a result - prod was seariously freezed.

Maybe you can help me to understand:

  1. Thats really bug? Because, as i said before - only some sequence broke Loki.
  2. Its possible to block queries like this (with this sequince,i mean) on Loki(grafana maybe) site? Just want to avoid this.

P.S. I am still a little dissapointed, that query like this " 1. {application=“app”} !~ “(?i)info” |~ “(?i)err” | filename = “E:/app/app.log”" not consumed performance, but “{application=“app”} !~ “(?i)info” | filename = “E:/app/app.log” |~ “(?i)err”” - brokes Loki.
In my opinion, the first one should find by regexp much more data, as a result - consume more cpu/ram.

idk why, but screenshots in first post not loading now

RAM:

CPU:

  1. I don’t normally use builder, so I can’t comment on that.
  2. You can block queries, see Blocking Queries | Grafana Loki documentation

Containers using more memory isn’t necessarily a bug, it’s only a bug if you can prove that it’s not supposed to. If you are using monolithic mode you should expect it to run out of memory from time to time. One thing you can try is to try and increase the time out, and split the query (so it doesn’t use as much memory). It might take a big longer to run on a monolithic loki container, but might be less change to run into memory issue.

1 Like