S3 SlowDown Error: Failed to flush chunks (503) – Request Rate Optimization Issue

Description:
We are experiencing occasional errors when writing logs to S3 using Loki. The error message we see is:

failed to flush chunks: store put chunk: SlowDown: Please reduce your request rate. status code: 503

Environment:

  • Single tenant deployment using a single S3 bucket (s3.us-east-1.amazonaws.com)
  • Ingestion parameters: ingestion_rate_mb is set to 16 and ingestion_burst_size_mb is set to 32
  • Chunk parameters: chunk_target_size is set to 8MB, and chunk_idle_period is set to 1h

Steps to Reproduce:

  • Under high log ingestion load, some chunks fail to flush to S3, triggering the intermittent “SlowDown” error during write operations.

What Has Been Tried:

  • Lowering the ingestion rate to smooth out the write flow
  • Adjusting chunk_target_size and chunk_idle_period to encourage batch writes to S3
  • Tuning HTTP connection settings such as timeout and idle_conn_timeout
  • Verifying and adjusting retry strategies for write failures

That’s a rate limit error, and the only way to deal with is is to reduce write attempts. Sounds like you’ve tried to tune the settings already, so I am not sure much more can be done on that front. Perhaps try and switch to multiple S3 bucket setup?

You’ll want to carefully test the migration procedure, of course.