Rate limit config doesn't work

I got this error and bunch of missing logs from my traces

ERROR level=error caller=manager.go:49 component=distributor path=write msg="write operation failed" details="ingestion rate limit exceeded for user anonymous (limit: 5242880 bytes/sec) while attempting to ingest '598' lines totaling '1047701' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased" org_id=anonymous
as we can see the error reported the I have a rate limit at 5MB/s but I already adjusted in the config to 15MB/s and still got this error.
loki helm chart version: 6.35.0

this is the config

loki:
  podLabels:
    azure.workload.identity/use: "true"
 
  auth_enabled: true
 
  commonConfig:
    replication_factor: 1
 
  storage_config:
    boltdb_shipper: null
    # https://github.com/grafana/loki/issues/16599
    use_thanos_objstore: true
 
  podSecurityContext:
    seccompProfile:
      type: RuntimeDefault
 
  storage:
    use_thanos_objstore: true
    object_store:
      type: azure
      azure:
        endpoint_suffix: "blob.core.windows.net"
    bucketNames:
      chunks: loki-chunks
      ruler: loki-ruler
    #ruler_storage:
    #  azure:
    #    endpoint_suffix: "blob.core.windows.net"
    #    container_name: loki-ruler
  schemaConfig:
    configs:
      - from: "2025-01-01"
        store: tsdb
        index:
          prefix: loki_index_
          period: 24h
        object_store: azure
        schema: v13
 
  ingester:
    chunk_encoding: snappy
    chunk_idle_period: 30m
    chunk_target_size: 1572864
    flush_check_period: 15s
    wal:
      replay_memory_ceiling: 1024MB
 
  pattern_ingester:
    enabled: true
 
  tracing:
    enabled: true
 
  querier:
    max_concurrent: 4
 
  compactor:
    working_directory: /var/loki/compactor
    compaction_interval: 10m
    retention_enabled: true
    delete_request_store: azure
    retention_delete_delay: 2h
    retention_delete_worker_count: 150
 
  structuredConfig:
    server:
      grpc_server_max_recv_msg_size: 8388608
      grpc_server_max_send_msg_size: 8388608
 
  limits_config:
    allow_structured_metadata: true
    volume_enabled: true
    split_queries_by_interval: 1h
    max_query_series: 500
    reject_old_samples: true # this is the main control to get the discarded chunks down
    reject_old_samples_max_age: 8h
    unordered_writes: true # this seems necessary currently. There still might be some clients that have their clocks skewed
    per_stream_rate_limit: 9MB # default 3MB
    per_stream_rate_limit_burst: 20MB # default 15MB
    ingestion_burst_size_mb: 30 # default = 6
    ingestion_rate_mb: 15 # default = 4
    retention_period: 90d
    max_line_size: 256KB
    max_line_size_truncate: true # trunctate to long messages instead of rejecting them
 
deploymentMode: Distributed

Have you tried all the suggestions in the Troubleshooting docs for this error?

Hi, @juliestickler Thanks for the advice. I already read that and follow the instruction, but I still got the issue.

  limits_config:
    allow_structured_metadata: true
    volume_enabled: true
    split_queries_by_interval: 1h
    max_query_series: 500
    reject_old_samples: true # this is the main control to get the discarded chunks down
    reject_old_samples_max_age: 8h
    unordered_writes: true # this seems necessary currently. There still might be some clients that have their clocks skewed
    per_stream_rate_limit: 9MB # default 3MB
    per_stream_rate_limit_burst: 20MB # default 15MB
    ingestion_burst_size_mb: 30 # default = 6
    ingestion_rate_mb: 15 # default = 4
    retention_period: 90d
    max_line_size: 256KB
    max_line_size_truncate: true # trunctate to long messages instead of rejecting them

babecolengo,

I ran your error message and config through my AI, and this is the analysis it generated:

Most likely cause: your 15MB is being applied globally, then split per distributor, and the error prints the local distributor limit.

If you have 3 healthy distributors, 15 / 3 = 5MB/s (exactly 5242880 bytes/sec), which matches your error.

Also important in the pasted values:

  • org_id=anonymous means writes are for tenant anonymous (either header explicitly set, or a proxy/gateway maps requests to that tenant).

  • A per-tenant runtime override for anonymous can still force 5MB even if default is 15.

  • In Helm chart 6.35.0, only loki.limits_config is used in generated Loki config; top-level limits_config is ignored.

What to verify next

  • Confirm distributor replica count (if 3, 5MB local is expected with global strategy).

  • Check active runtime/default limits for tenant anonymous via /config/tenant/v1/limits with X-Scope-OrgID: anonymous.

  • Check rendered config in cluster (helm get values, generated config map/secret) to ensure loki.limits_config.ingestion_rate_mb: 15 is really applied.

  • Ensure clients aren’t pinned to one distributor (LB/keepalive behavior can make one distributor hit its local share first).

  • If you want each distributor to enforce full 15MB independently, set ingestion_rate_strategy: local (with care: total cluster allowance becomes 15 * N).

@juliestickler,
Thanks again the rate limit issue is gone, but my traces are still missing the logs. at least we fixed one issue.

@babecolengo if I’m understanding your query correctly, you’re looking for a single span on a specific trace? I’m not surprised that the result was no matching logs for such a specific query.

@juliestickler,
I use the Logs for this span button from the traces. I’m not sure if that change anything?

UPDATE:


this happens when I clicked the Blue Logs for this span button

From this error I have now fixed the issue because it has a mismatch labels.