Loki Pods Not Releasing Memory After Query Completion

Hello Grafana Community,

I am currently experiencing an issue with Loki where the pods do not seem to release memory after a query completes. Here are the details:

Environment

Issue Description
When running queries in Loki, the memory usage of the pods increases significantly. For instance, when I run a query spanning 30 days (approximately 11GB of data), the pod’s memory usage spikes to around 2.7GB, which is expected and acceptable. However, after the query completes and the results are displayed, the memory usage remains high and does not decrease.
Please follow the screenshot for result.


After completion of 15 Minutes time

Below is the configuration deatils

server:
    http_listen_port: 3100
    grpc_listen_port: 9095
    grpc_server_max_concurrent_streams: 1000
    http_server_write_timeout: 60s
    http_server_idle_timeout: 40m
    http_server_read_timeout: 20m
    grpc_server_max_recv_msg_size: 104857600 # 100 MB, might be too much, be careful
    grpc_server_max_send_msg_size: 104857600 
  # -- Limits config  
  limits_config:
    query_timeout: 5m
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    max_cache_freshness_per_query: 10m
    max_entries_limit_per_query: 2000
    split_queries_by_interval: 15m
    max_query_parallelism: 15
    tsdb_max_query_parallelism: 25
    max_query_series: 2000
    ingestion_rate_mb: 20
    max_query_length: 6000h
    ingestion_burst_size_mb: 20
    ingestion_rate_strategy: global
    volume_enabled: true
  # -- Provides a reloadable runtime configuration file for some specific configuration
  runtimeConfig: {}
  # -- Check https://grafana.com/docs/loki/latest/configuration/#common_config for more info on how to provide a common configuration
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    compactor_address: '{{ include "loki.compactorAddress" . }}'
  # -- Storage config. Providing this will automatically populate all necessary storage configs in the templated config.
  storage:
    bucketNames:
      chunks: chunks
      ruler: ruler
      admin: admin
    type: azure
    azure:
      accountName: grafanalokistg
      accountKey: null
      connectionString: null
      useManagedIdentity: false
      useFederatedToken: false
      userAssignedId: 4a5d2c4e-8ec4-4299-95ec-3e96d3669634
      requestTimeout: null
      endpointSuffix: null
  # -- Configure memcached as an external cache for chunk and results cache. Disabled by default
  # must enable and specify a host for each cache you would like to use.
  memcached:
    chunk_cache:
      enabled: true
      host: loki-chunks-cache.logging.svc
      service: memcached-client
      batch_size: 128
      parallelism: 5
    results_cache:
      enabled: true
      host: loki-results-cache.logging.svc
      service: memcached-client
      timeout: "300ms"
      default_validity: "1h"
  # -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
  schemaConfig: 
    configs:
    - from: "2024-05-17"
      object_store: azure
      store: tsdb
      schema: v13
      index:
        prefix: index_
        period: 24h
  # -- a real Loki install requires a proper schemaConfig defined above this, however for testing or playing around
  # you can enable useTestSchema
  useTestSchema: false
  testSchemaConfig:
    configs:
      - from: 2024-04-01
        store: tsdb
        object_store: '{{ include "loki.testSchemaObjectStore" . }}'
        schema: v13
        index:
          prefix: index_
          period: 24h
  # -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
  rulerConfig: {}
  # -- Structured loki configuration, takes precedence over `loki.config`, `loki.schemaConfig`, `loki.storageConfig`
  structuredConfig: {}
  # -- Additional query scheduler config
  query_scheduler: 
       use_scheduler_ring: false
       max_outstanding_requests_per_tenant: 320000
  # -- Additional storage config
  storage_config:
    boltdb_shipper:
      index_gateway_client:
        server_address: '{{ include "loki.indexGatewayAddress" . }}'
    tsdb_shipper:
      index_gateway_client:
        server_address: '{{ include "loki.indexGatewayAddress" . }}'
    hedging:
      at: "250ms"
      max_per_second: 20
      up_to: 3
  # --  Optional compactor configuration
  compactor: {}
  # --  Optional pattern ingester configuration
  pattern_ingester:
    enabled: false
  # --  Optional analytics configuration
  analytics: {}
  # --  Optional querier configuration
  query_range: 
    parallelise_shardable_queries: true
    align_queries_with_step: true
    max_retries: 5
    cache_results: true
    results_cache:  
      cache:
        memcached_client:
        consistent_hash: true
        host: loki-results-cache.logging.svc
        service: memcached-client
        max_idle_conns: 32
        timeout: 1s
        update_interval: 1m   
  # --  Optional querier configuration
  querier: 
    engine:
      max_look_back_period: 24h
    max_concurrent: 500
    query_ingesters_within: 6h
  # --  Optional ingester configuration
  ingester: 
    lifecycler: 
     ring: 
      kvstore:
       store: memberlist
      replication_factor: 1
    chunk_block_size: 262144
    chunk_encoding: snappy
    chunk_idle_period: 15m
    chunk_retain_period: 30s
    chunk_target_size: 26214434
  # --  Optional index gateway configuration
  index_gateway:
    mode: 
    ring:
        kvstore:
          store: memberlist
  frontend:
    log_queries_longer_than: 2s
    max_outstanding_per_tenant: 8192
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
    tail_proxy_url: '{{ include "loki.querierAddress" . }}'
  frontend_worker:
     grpc_client_config:
      grpc_compression: snappy
      max_recv_msg_size: 1048576000
      max_send_msg_size: 1048576000
     scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
  # -- Optional distributor configuration

Request for Help

Suggestions on Configuration: Are there any additional parameters or configurations I should adjust to ensure memory is released after a query completes?

Thanks,
Bhanu.

Hi team,

Please, someone guide us.

Thanks,
Bhanu.

Hi, Team,

Please, someone guide us.

Thanks,
bhanu.

@tonyswumac any suggestions here?

Not quite sure, don’t see anything obvious. If you are able to I’d recommend running profiling the querier and see what’s there. I personally haven’t done a deep dive into the memory usage of Loki.

hi @tonyswumac

is this excepted behavior or is this going to be fixed in future version. is there any refer documentation for suppressing the memory.

Thanks,
Bhanu.

I can’t really answer this question.

I don’t personally see this behavior. In our Loki cluster the queriers hover around 400MB memory when idling, and memory spike usually drops 30-ish minutes after it’s idle again.

You are assuming there is a problem and something to fix, but it’s not clear that there is a problem. Containers using memory isn’t necessarily a problem, and from your description it sounds like it does drop off, just not as much or not as quickly. If it’s truly a problem for you, like I mentioned above you should probably do some sort of memory profiling, otherwise there isn’t much information to go on.

Hi @tonyswumac

Thanks For the info.

Thanks,
Bhanu.