Mimir distributer getting OOM killed due to outdated/ out of order data

ehsank · September 18, 2024, 4:56pm

Hi all,
We are having an issue on the Mimir deployment in our prod. We’ve set the out_of_order_time_window to 20 minutes, but for some reason, some Alloy agents running on different clusters are started sending outdated metrics. As a result, we are seeing a high volume of errors in both the ingester and distributor.

error on the ingesters are like: The sample has been rejected because another sample with a more recent timestamp has already been ingested, and this sample is beyond the out-of-order time window of 20m (err-mimir-sample-timestamp-too-old).
These errors are causing the distributor pods to get OOM and continuously restart. Increasing the memory limit doesn’t resolve the problem, as it eventually fills up again.
I’ve tried increasing the out_of_order_time_window , but I’m still receiving metrics older than the threshold. Could you advise on how to resolve this issue and prevent it from happening again? Also, how can we protect the distributor from being overwhelmed by outdated data?
P.S: I want to make sure that I am safeguarding my Distributer deployment. I know I can change tha alloy agent sample_age_limit to not send outdated data.

Topic		Replies	Views
Prometheus remote-write: err-mimir-sample-out-of-order rejections Grafana Alloy query-help , config-help , mimir , alloy	0	168	May 23, 2025
Error forwarding metrics with Alloy to Mimir Grafana Alloy	3	405	October 27, 2024
Loss of Data sent from Tempo metrics generator to Mimir Grafana Tempo	8	850	June 6, 2024
Displaying Mimir metrics with $__rate_interval always shows "No data" Time Series Panel	1	2053	June 8, 2023
Mimir Ingesters failing on "no space left on device" Configuration	2	3338	May 31, 2022

Mimir distributer getting OOM killed due to outdated/ out of order data

Related topics