Loki scalable helm chart read pod stop working

After install loki using loki scalable helm chart and running it for 1 day, I got errors like

level=error ts=2022-01-14T05:04:20.550817878Z caller=table.go:149 msg="failed to open existing boltdb file /var/loki/boltdb-shipper-cache/loki_index_19005/loki-write-0-1641904626097129380-1642107600.gz, removing the file and continuing without it to let the sync operation catch up" err="disk quota exceeded"

level=info ts=2022-01-14T05:04:20.57901267Z caller=table.go:432 msg="downloading object from storage with key loki-write-0-1641904626097129380-1642107600.gz"

level=info ts=2022-01-14T05:04:20.596570128Z caller=util.go:109 msg="downloaded file loki-write-0-1641904626097129380-1642107600.gz from table loki_index_19005"

level=error ts=2022-01-14T05:04:21.766858047Z caller=log.go:106 msg="error running loki" err="disk quota exceeded\nerror creating index client\ngithub.com/grafana/loki/pkg/storage/chunk/storage.NewStore\n\t/src/loki/pkg/storage/chunk/storage/factory.go:201\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:370\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:106\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:78\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:322\nmain.main\n\t/src/loki/cmd/loki/main.go:96\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nerror initialising module: store\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:78\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:322\nmain.main\n\t/src/loki/cmd/loki/main.go:96\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"

configs snippet

loki:
  config:
    limits_config:
      # Per-user ingestion rate limit in sample size per second. Units in MB.
      ingestion_rate_mb: 15
      # Per-user allowed ingestion burst size (in sample size). Units in MB.
      # The burst size refers to the per-distributor local rate limiter even in the
      # case of the "global" strategy, and should be set at least to the maximum logs
      # size expected in a single push request.
      ingestion_burst_size_mb: 20
    schema_config:
      configs:
        - from: 2021-12-24
          store: boltdb-shipper
          object_store: s3
          schema: v11
          index:
            prefix: loki_index_
            period: 24h
    storage_config:
      aws:
        s3: creds_here
        bucketnames: loki_bucket
        s3forcepathstyle: true
        insecure: true
    chunk_store_config:
      max_look_back_period: 672h
    table_manager:
      retention_deletes_enabled: true
      retention_period: 672h

serviceMonitor:
  enabled: true

read:
  persistence:
    size: 20Gi
    storageClass: efs-sc

write:
  persistence:
    size: 20Gi
    storageClass: efs-sc

Hi @haironggao, it sounds like your chunks might be too big and so your disk is filling up before it has time to ship them to s3. Could you try reducing your max_chunk_age (defaults to 2h), and/or your chunk_target_szie (defaults to 1572864 bytes)?

@trevorwhitney1 thanks for your reply, this error threw by efs . We’re using aws efs for pv, so I contact aws support , it’s said that our cluster is running out of the disk quota . But what I’m not sure is why loki is cosuming such massive quota

 Amazon EFS doesn't currently support user disk quotas. This error can occur if any of the following limits have been exceeded:

    Up to 128 active user accounts can have files open at once for an instance.

    Up to 32,768 files can be open at once for an instance.

    Each unique mount on the instance can acquire up to a total of 8,192 locks across 256 unique file-process pairs. For example, a single process can acquire one or more locks on 256 separate files, or eight processes can each acquire one or more locks on 32 files.

ref: Troubleshooting File Operation Errors - Amazon Elastic File System

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.