Not Getting Logs in Real Time

Hi Community!

Using Explore, we tried to simulate getting logs from the server to Loki and see how ‘real time’ it is.

We get different results. Sometimes we get 10 seconds delay, sometimes 1 hour. But i believe the idea of real time is that it should show with at most a 1sec delay, right?

We are using promtail, and verified there is not network bottleneck.

Any configuration we missed or could look into?

Please share your configuration.

Depending on your setting, log streams (especially the small ones) can stay on the ingester for a while being being written to storage. Loki reader can query Loki writer within a certain time frame (this is thequery_ingesters_within configuation), and if you are seeing delay this is a likely cause that either your query_ingesters_within is not configured correctly, or your reader can’t connect to writers.

Hey Tony,

Here is our configuration


auth_enabled: false

server:
http_listen_port: 3100
grpc_server_max_recv_msg_size: 16777216
grpc_server_max_send_msg_size: 16777216
log_level: debug
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: etcd
etcd:
endpoints:
-
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v12
index:
prefix: index_
period: 24h

query_scheduler:
max_outstanding_requests_per_tenant: 8192
frontend:
max_outstanding_per_tenant: 8192
log_queries_longer_than: 10s
compress_responses: true
query_range:
parallelise_shardable_queries: true
align_queries_with_step: true
cache_results: true
limits_config:
split_queries_by_interval: 15m
max_query_length: 0h
max_query_parallelism: 32
ingestion_rate_strategy: local
ingestion_rate_mb: 32
ingestion_burst_size_mb: 64
max_streams_per_user: 0
max_entries_limit_per_query: 5000000
max_global_streams_per_user: 0
cardinality_limit: 200000
ruler:
alertmanager_url: http://localhost:9093

By default, Loki will send anonymous, but uniquely-identifiable usage and configuration

analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/

Statistics help us better understand how Loki is used, and they show us performance

levels for most users. This helps us prioritize features and documentation.

For more information on what’s sent, look at

https://github.com/grafana/loki/blob/main/pkg/usagestats/stats.go

Refer to the buildReport method to see what goes into a report.

If you would like to disable reporting, uncomment the following lines:

table_manager:
retention_deletes_enabled: true
retention_period: 2160h

analytics:
reporting_enabled: false


Just to provide more context, our log volume reaches up to 10,000 lines per second at peak times, and the issue now is that Grafana does not display the volume of logs being sent. We suspect Loki is not keeping up with the volume of logs being sent, but are unsure.

We test this in 2 ways:

  • Test 1: Server 1 (Loki) ← Server 2 (Promtail
  • Test 2: Server 1 (Loki + Promtail)

Both simulating 10,000 Logs. Both failed to get the data fast enough, and in full (meaning huge chunk of logs missing).

We tried the following:

  • The KV Store has change from in-memory to ETCD
  • Use TSDB instead of BoltDB

Please help.

I think your configuration overall looks good. Perhaps try increasing the size of chunk files before it’s written so your ingester can write to file system less often by tweaking chunk_target_size.

But honestly it may be time for you to consider simple scalable mode with an object storage backend.