Query goes from instant success to timeout failure at arbitrary line-limit under max_entries_limit_per_query

With a json parser attached to my query, I get success with 4677 lines in 100ms, but with 4678 as maxLines it fails. I have tried reading from multiple different log files to see if it is related to that. The magic number remains the same. Regardless of number of bytes read too.
However, If i remove the json parser from my query i can get up to a different magic number: 8188

Failure at 8189:

I have increased max_entries_limit_per_query to one million, which is why i can get over 5000.

My current setup is a docker compose network, using for local development. This is why you see all my logs appear at once. I can provide config files as needed.

We’re getting a little over 20k loglines an hour per file right now, and in general, want to write queries over much longer periods of time so this is a problem.

Please share your Loki configuration.

Here it is. I know it is getting read. It’s 99% a copy of whatever the defaul was in the container but i added the limits_config section

# https://grafana.com/docs/loki/latest/configure/#limits_config
auth_enabled: false

server:
  http_listen_port: 3100

limits_config:
  max_entries_limit_per_query : 100000
  # max_query_series: 5000
  # ingestion_rate_mb: 10000
  # ingestion_burst_size_mb: 1000
  max_query_length: 0
  max_query_parallelism: 32

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/usagestats/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
#  reporting_enabled: false

  1. Try adjusting msg size:
server:
  # 100MB
  grpc_server_max_recv_msg_size: 1.048576e+08
  grpc_server_max_send_msg_size: 1.048576e+08
  1. Check your container metrics and see if you observe CPU or memory pressure.

  2. Check logs and see if your Loki container is producing any error log.

#1 worked. Do you know why? Before, i didnt see any memory pressure or errors in the log other than the same one from above.

It’s essentially a configuration for the maximum message size when loki components communicate with each other.

I now run into a different issue when getting large amounts of data. I up my limit to _per_query to 9,999,999, but It seems the message size eventually taps out. Can i just keep increasing the max message size and timeout to any large number, or is what you gave me the magic maximum number? Is there a better way around this problem like multiple messages or something?

How large is large?

If you are querying a lot of data, you should consider scaling Loki to multiple containers / instances.

We are more concerned with trends and past occurrences, meaning many months.

We ultimately decided to move to having Nlog insert directly to postgres, and have grafana read that instead. With promtail/loki we couldn’t get enough granularity in the database with things mostly being a log line and timestamp, so queries going back that far had to return a ton of results so that more filtering could be done in grafana.

I know we are moving a bit away from the live-look observability that these things were intended for. It’s just easier to write semi-elegant and/or aggregation queries in sql than logql. But we still like the alerting and visualizations of grafana.

(this is mostly a note for future devs running into these problems)