Error Per stream rate limit exceeded

Hi,

Please I need help with the below error, it is showing up multiple times in the ingester component.

I have tried to increase many limits, but it is showing the limit as 4KB/sec for many streams.

consider splitting a stream via additional labels or contact your Loki administrator to see if the limit can be increased',\nentry with timestamp 2024-08-07 05:08:56.217050506 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 4KB/sec) while attempting to ingest for stream

Here is my configuration:

loki:
  auth_enabled: false
  server:
    grpc_server_max_recv_msg_size: 1048576000000
    grpc_server_max_send_msg_size: 1048576000000


  compactor:
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_delay: 1m
    retention_delete_worker_count: 30
    delete_request_store: s3
  common:
    ring:
      kvstore:
        store: memberlist

  memberlist:
    join_members:
      - loki-memberlist:7946

  schemaConfig:
    configs:
      - from: 2024-07-11
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  ingester:
    chunk_block_size: 1048576
    chunk_encoding: snappy
    chunk_idle_period: 30m
    chunk_retain_period: 1m
  ingester_client:
    remote_timeout: 120s
  querier:
    max_concurrent: 12
  query_range:
    parallelise_shardable_queries: true 
    align_queries_with_step: true
    max_retries: 10
    cache_results: true
  query_scheduler:
    max_outstanding_requests_per_tenant: 4096 
  limits_config:
    bloom_gateway_enable_filtering: true
    bloom_compactor_enable_compaction: true
    reject_old_samples: true 
    reject_old_samples_max_age: 1w 
    retention_period: 14d
    query_timeout: 360s
    ingestion_rate_mb: 35112
    ingestion_burst_size_mb: 90048
    per_stream_rate_limit: 35GB
    per_stream_rate_limit_burst: 90GB
    split_queries_by_interval: 6h
    max_query_series: 10000
    max_query_length: 48h
    max_entries_limit_per_query: 25000
    max_query_parallelism: 1024
    tsdb_max_query_parallelism: 1024
    max_concurrent_tail_requests: 1000
    volume_enabled: true
    max_global_streams_per_user: 1000000
    max_streams_per_user: 1000000
    max_line_size: 0
    retention_stream:
    - selector: '{job=~"default/.+"}'
      priority: 1
      period: 24h
    - selector: '{job=~"kube-system/.+"}'
      priority: 2
      period: 24h
  frontend:
    log_queries_longer_than: 60s
    max_outstanding_per_tenant: 500000
    compress_responses: true

This seems a bit excessive. The recommendation is 5MB/20MB for these, I’d try that and see if it helps. Also hit the /config endpoint on your ingesters and double check your configurations are applied.

1 Like

I have checked the configuration, and it is applied in the pod’s config file. I have even reduced it to the recommended 5MB/20MB settings and it’s still showing the same error unfortunately.

These are the stream and ingester related configurations we use, try these and see if they work for you:

  ingestion_burst_size_mb: 200
  ingestion_rate_mb: 100
  ingestion_rate_strategy: local
  per_stream_rate_limit: 100M
  per_stream_rate_limit_burst: 200M

unfortunately, I am still getting the same limit error of 4KB/sec per stream rate limit exceeded after those changes.

Couple of more things to try:

  1. How are you deploying your Loki cluster?
  2. What version?
  3. Make sure you don’t have any limit overrides configured.
  4. Try disabling bloom filter.

This issue is just weird, here are the points.

  1. Loki is deployed in a local Kubernetes cluster across 9 nodes using the official helm chart in distributed mode.
  2. Loki version 3.1.0
  3. I looked over my limits that are configured and they are the same as the ones shared, nothing seems to be overriding any other value.
  4. I have disabled blooms in general and I don’t see any improvement or change in the error message.

Can you double check your ingester configuration by hitting the http endpoint with /config? Just as a sanity check.

You could also try to deploy with a different topology (such as simple scalable mode), and maybe even a different version (maybe 3.0.0 or even 2.9.*), just as tests. You never know, the more you poke at it sometimes you find stuff you weren’t seeing before. I still think your configurations are somehow not applied, though.

I have double checked the config from inside the container and saw the loki config that is being used, which shows the exact same config values as the ones in the helm config.

even ran loki -verify-config in all the ingester pods and it shows as valid.

Unfortunately, I cannot downgrade as this is a live environment, as even with the errors I am still getting most of the logs.

Try sharding and see if this helps (if you don’t have this in your configuration it’s probably already enabled: Automatic stream sharding | Grafana Loki documentation)

I’d consider putting up a test cluster if you can. Use the same helm chart, and deploy to your dev environment, and see what happens, so that you can freely poke at it.

Yes sharding is enabled, as for the test cluster I have one cluster in a test environment now, but the log volume is definitely not the same because the environments are segregated, and I’m not getting the error there.

Honestly I am not quite sure why you are having this problem, and I think your best bet is to keep poking at it, and try to re-produce it if possible. You can use Grafana’s K6 to generate enough burst traffic to mimic your production cluster, and see if that gets you what you need.

Thank you for the help. I will try to drill down deeper into the issue and hopefully get to a resolution.

Hi,
Just an update, I do not get the error messages anymore!

Here is my updated config, where bloom has been completely disabled, and ingestion_rate_strategy has been set to local with sane per_stream_rate_limit limits.

  limits_config:
    bloom_gateway_enable_filtering: false
    bloom_compactor_enable_compaction: false
    reject_old_samples: true
    reject_old_samples_max_age: 1w
    retention_period: 14d
    query_timeout: 360s
    ingestion_rate_mb: 100
    ingestion_burst_size_mb: 200
    per_stream_rate_limit: 100M
    per_stream_rate_limit_burst: 200M
    ingestion_rate_strategy: local
    split_queries_by_interval: 6h
    max_query_series: 10000
    max_query_length: 48h
    max_entries_limit_per_query: 25000
    max_query_parallelism: 1024
    tsdb_max_query_parallelism: 1024
    max_concurrent_tail_requests: 1000
    volume_enabled: true
    max_global_streams_per_user: 10000000
    max_streams_per_user: 10000000
    max_line_size: 0

Thank you all for the help and support!