Not Getting Logs in Real Time

Hi Community!

Using Explore, we tried to simulate getting logs from the server to Loki and see how ‘real time’ it is.

We get different results. Sometimes we get 10 seconds delay, sometimes 1 hour. But i believe the idea of real time is that it should show with at most a 1sec delay, right?

We are using promtail, and verified there is not network bottleneck.

Any configuration we missed or could look into?

Please share your configuration.

Depending on your setting, log streams (especially the small ones) can stay on the ingester for a while being being written to storage. Loki reader can query Loki writer within a certain time frame (this is thequery_ingesters_within configuation), and if you are seeing delay this is a likely cause that either your query_ingesters_within is not configured correctly, or your reader can’t connect to writers.

Hey Tony,

Here is our configuration

auth_enabled: false

http_listen_port: 3100
grpc_server_max_recv_msg_size: 16777216
grpc_server_max_send_msg_size: 16777216
log_level: debug
path_prefix: /loki
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
store: etcd
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v12
prefix: index_
period: 24h

max_outstanding_requests_per_tenant: 8192
max_outstanding_per_tenant: 8192
log_queries_longer_than: 10s
compress_responses: true
parallelise_shardable_queries: true
align_queries_with_step: true
cache_results: true
split_queries_by_interval: 15m
max_query_length: 0h
max_query_parallelism: 32
ingestion_rate_strategy: local
ingestion_rate_mb: 32
ingestion_burst_size_mb: 64
max_streams_per_user: 0
max_entries_limit_per_query: 5000000
max_global_streams_per_user: 0
cardinality_limit: 200000
alertmanager_url: http://localhost:9093

retention_deletes_enabled: true
retention_period: 2160h

reporting_enabled: false

Just to provide more context, our log volume reaches up to 10,000 lines per second at peak times, and the issue now is that Grafana does not display the volume of logs being sent. We suspect Loki is not keeping up with the volume of logs being sent, but are unsure.

We test this in 2 ways:

  • Test 1: Server 1 (Loki) ← Server 2 (Promtail
  • Test 2: Server 1 (Loki + Promtail)

Both simulating 10,000 Logs. Both failed to get the data fast enough, and in full (meaning huge chunk of logs missing).

We tried the following:

  • The KV Store has change from in-memory to ETCD
  • Use TSDB instead of BoltDB

Please help.