Error in Loki-write pods

zozoo · July 11, 2024, 6:23pm

Hello,

today I have observed that I have a bunch of these kind of errors in the loki-write pods:

org_id=fake msg="failed to flush" err="failed to flush chunks: store put chunk: SlowDown: A timeout exceeded while waiting to proceed with the request, please reduce your request rate\n\tstatus code: 503

the Loki is writing to an s3 bucket and even if the loki datasource in grafana seam to be working. There is a timeout in retrieving the labels.

Any suggestion on how to fix it?

tonyswumac · July 11, 2024, 8:23pm

First you should find out if you are indeed being rate limited by S3. The SlowDown message usually means it is S3 rate limiting you, but still good to confirm

If your Loki cluster is indeed rate limited by S3, couple of things I can think of that might help:

Try to write less files less frequently by tweaking expected chunk size, idle period, and max idle period.
Try using multiple S3 buckets as chunk storage.

As for label, you should find out why it’s timing out. I doubt it’s because of rate limiting. S3 allows 5500 GET requests per second, you should not run out of that by simply reading index files for labels.

zozoo · July 12, 2024, 5:41pm

The S3 is an object store configured on a freenas installation. I see no config that would limit the S3 bucket. Is it better to write larger chunks less often or smaller chunks more often? How do I configure multiple buckets as chunk storage?

tonyswumac · July 16, 2024, 6:17pm

I would double check freenas. There has to be rate limiting somewhere, but I’ve never used it so maybe I am just wrong.
It’s better to write a reasonably sized chunks less often. This is a consideration for any sort of distributed storage and computing systems (think Hadoop and Map Reduce). You don’t want to write too many files so you don’t incur too much network overhead, but you also don’t want to write too large of a file so you don’t have to waste bandwidth (a certain percentage of each chunk read is never needed either because it’s filtered out or limited by the number of results returned). I find the default value from Loki is reasonable, but if you run into rate limiting then it’s up to you to figure out what your sweet spot is.
If you look under aws_storage_config you’ll see that bucketnames can be a comma separated list of bucket names. I’ve not used this before, so I can’t say for certain, but if you are looking to migrate from single bucket to multi bucket smoothly you probably need to create a new schema period and specify multi buckets configuration there. If you aren’t running a production cluster and are ok with losing all data then you can probably just slap it on.

zozoo · July 19, 2024, 4:23pm

it look like it indeed was a problem with the S3 server. For some reason it’s ingesting capability was limited and was not responding to queries. After restart all came back to normal.

system · July 19, 2025, 4:23pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
S3 SlowDown Error: Failed to flush chunks (503) – Request Rate Optimization Issue Grafana Loki	1	133	February 14, 2025
Loki complains about S3 bucket Grafana	5	5131	April 1, 2022
Getting throttled by s3 Grafana Loki	4	3626	February 7, 2022
Grafana loki S3 cost spike Grafana Loki loki	6	501	September 6, 2024
Loki Write Pod Failing to Flush on an AWS S3 bucket with kms encryption Grafana Loki loki	3	74	March 2, 2025

Error in Loki-write pods

Related topics