Hello Together,
we recently installed Promtail on some of our worker servers and sent all the data to one Loki server instance. This seems to work fine.
When it comes to querying data we often run into situations where we will be quoted with such an error on Grafana:
rpc error: code = ResourceExhausted desc = trying to send message larger than max (486002239 vs. 209715200)
I already increased, even by knowit it’s not good, the server.grpc_server_max_recv_msg_size and server.grpc_server_max_send_msg_size to something very high.
Some queries work well, and then, rerunning them, they do not work again.
I don’t understand what I can do not to have such a high limit defined.
Maybe there is some guide on how to do it, right?
Here’s our current loki configuration file:
#jinja2:lstrip_blocks: True
---
{{ ansible_managed | comment }}
auth_enabled: false
server:
http_listen_port: 3100
grpc_server_max_concurrent_streams: 1024
grpc_server_max_recv_msg_size: 409715200 # 200 MB, might be too much, be careful
grpc_server_max_send_msg_size: 409715200 # 200 MB, might be too much, be careful
http_server_write_timeout: 310s
http_server_read_timeout: 310s
ingester_client:
grpc_client_config:
max_recv_msg_size: 409715200 # 200 Mb
max_send_msg_size: 409715200 # 200 Mb
querier:
query_timeout: 300s
engine:
timeout: 300s
max_concurrent: 24
ingester:
chunk_encoding: snappy
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
chunk_idle_period: 2h
chunk_target_size: 1536000
chunk_retain_period: 30s
max_chunk_age: 2h
wal:
dir: "/tmp/wal"
compactor:
working_directory: /var/lib/loki/compactor
shared_store: filesystem
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /var/lib/loki/index
cache_location: /var/lib/loki/cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /var/lib/loki/chunks
limits_config:
retention_period: 72h
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_cache_freshness_per_query: 10m
split_queries_by_interval: 15m
# for big logs tune
per_stream_rate_limit: 512M
per_stream_rate_limit_burst: 1024M
cardinality_limit: 200000
ingestion_burst_size_mb: 1000
ingestion_rate_mb: 10000
max_entries_limit_per_query: 1000000
max_label_value_length: 20480
max_label_name_length: 10240
max_label_names_per_series: 300
max_query_parallelism: 24
frontend_worker:
match_max_concurrent: true
grpc_client_config:
max_send_msg_size: 409715200
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 336h
Will more Loki instances help? Would saving the logs to S3 help? My problem is understanding how to do it in a way that our developers can use Loki as a reliable tool to query application metrics.
Thank you all in advance for your replies!