Hello Grafana Community,
I am currently experiencing an issue with Loki where the pods do not seem to release memory after a query completes. Here are the details:
Environment
- **Loki Version: docker.io/grafana/loki:3.0.0
- Deployment Method: Helm
- Average Log Ingestion Rate: ~8GB/day
Issue Description
When running queries in Loki, the memory usage of the pods increases significantly. For instance, when I run a query spanning 30 days (approximately 11GB of data), the pod’s memory usage spikes to around 2.7GB, which is expected and acceptable. However, after the query completes and the results are displayed, the memory usage remains high and does not decrease.
Please follow the screenshot for result.
After completion of 15 Minutes time
Below is the configuration deatils
server:
http_listen_port: 3100
grpc_listen_port: 9095
grpc_server_max_concurrent_streams: 1000
http_server_write_timeout: 60s
http_server_idle_timeout: 40m
http_server_read_timeout: 20m
grpc_server_max_recv_msg_size: 104857600 # 100 MB, might be too much, be careful
grpc_server_max_send_msg_size: 104857600
# -- Limits config
limits_config:
query_timeout: 5m
reject_old_samples: true
reject_old_samples_max_age: 168h
max_cache_freshness_per_query: 10m
max_entries_limit_per_query: 2000
split_queries_by_interval: 15m
max_query_parallelism: 15
tsdb_max_query_parallelism: 25
max_query_series: 2000
ingestion_rate_mb: 20
max_query_length: 6000h
ingestion_burst_size_mb: 20
ingestion_rate_strategy: global
volume_enabled: true
# -- Provides a reloadable runtime configuration file for some specific configuration
runtimeConfig: {}
# -- Check https://grafana.com/docs/loki/latest/configuration/#common_config for more info on how to provide a common configuration
commonConfig:
path_prefix: /var/loki
replication_factor: 3
compactor_address: '{{ include "loki.compactorAddress" . }}'
# -- Storage config. Providing this will automatically populate all necessary storage configs in the templated config.
storage:
bucketNames:
chunks: chunks
ruler: ruler
admin: admin
type: azure
azure:
accountName: grafanalokistg
accountKey: null
connectionString: null
useManagedIdentity: false
useFederatedToken: false
userAssignedId: 4a5d2c4e-8ec4-4299-95ec-3e96d3669634
requestTimeout: null
endpointSuffix: null
# -- Configure memcached as an external cache for chunk and results cache. Disabled by default
# must enable and specify a host for each cache you would like to use.
memcached:
chunk_cache:
enabled: true
host: loki-chunks-cache.logging.svc
service: memcached-client
batch_size: 128
parallelism: 5
results_cache:
enabled: true
host: loki-results-cache.logging.svc
service: memcached-client
timeout: "300ms"
default_validity: "1h"
# -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
schemaConfig:
configs:
- from: "2024-05-17"
object_store: azure
store: tsdb
schema: v13
index:
prefix: index_
period: 24h
# -- a real Loki install requires a proper schemaConfig defined above this, however for testing or playing around
# you can enable useTestSchema
useTestSchema: false
testSchemaConfig:
configs:
- from: 2024-04-01
store: tsdb
object_store: '{{ include "loki.testSchemaObjectStore" . }}'
schema: v13
index:
prefix: index_
period: 24h
# -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
rulerConfig: {}
# -- Structured loki configuration, takes precedence over `loki.config`, `loki.schemaConfig`, `loki.storageConfig`
structuredConfig: {}
# -- Additional query scheduler config
query_scheduler:
use_scheduler_ring: false
max_outstanding_requests_per_tenant: 320000
# -- Additional storage config
storage_config:
boltdb_shipper:
index_gateway_client:
server_address: '{{ include "loki.indexGatewayAddress" . }}'
tsdb_shipper:
index_gateway_client:
server_address: '{{ include "loki.indexGatewayAddress" . }}'
hedging:
at: "250ms"
max_per_second: 20
up_to: 3
# -- Optional compactor configuration
compactor: {}
# -- Optional pattern ingester configuration
pattern_ingester:
enabled: false
# -- Optional analytics configuration
analytics: {}
# -- Optional querier configuration
query_range:
parallelise_shardable_queries: true
align_queries_with_step: true
max_retries: 5
cache_results: true
results_cache:
cache:
memcached_client:
consistent_hash: true
host: loki-results-cache.logging.svc
service: memcached-client
max_idle_conns: 32
timeout: 1s
update_interval: 1m
# -- Optional querier configuration
querier:
engine:
max_look_back_period: 24h
max_concurrent: 500
query_ingesters_within: 6h
# -- Optional ingester configuration
ingester:
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
chunk_block_size: 262144
chunk_encoding: snappy
chunk_idle_period: 15m
chunk_retain_period: 30s
chunk_target_size: 26214434
# -- Optional index gateway configuration
index_gateway:
mode:
ring:
kvstore:
store: memberlist
frontend:
log_queries_longer_than: 2s
max_outstanding_per_tenant: 8192
scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
tail_proxy_url: '{{ include "loki.querierAddress" . }}'
frontend_worker:
grpc_client_config:
grpc_compression: snappy
max_recv_msg_size: 1048576000
max_send_msg_size: 1048576000
scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
# -- Optional distributor configuration
Request for Help
Suggestions on Configuration: Are there any additional parameters or configurations I should adjust to ensure memory is released after a query completes?
Thanks,
Bhanu.