I have been seeing a problem in our Loki install (Kubernetes, using the loki-distributed Helm chart) where the memcachedChunks container will quickly run out of memory, which causes logs to fail to be ingested.
set operation results in:
“level=error ts=2021-10-04T15:35:46.721335152Z caller=memcached.go:235 msg=“failed to put to memcached” name=chunks err=“server= memcache: unexpected response line from \“set\”: \“SERVER_ERROR out of memory storing object\\r\\n\””\n”
“level=error ts=2021-10-04T15:35:58.30206562Z caller=memcached.go:235 msg=“failed to put to memcached” name=chunks err=“server= memcache: unexpected response line from \“set\”: \“SERVER_ERROR Out of memory during read\\r\\n\””\n”
Currently we have the following settings:
cpu: 500m
memory: 19073Mi
enabled: true
- -m 18000
- -I 32m
all memcached config settings are set to:
batch_size: 100
parallelism: 100
expiration: 30m
split_queries_by_interval is set to 15m and align_queries_with_step is set to True.
chunk_target_size is 1536000
What it appears like to me is that chunks are filling up the memcache server but they are never being purged. This is also causing logs to not be ingested which seems odd to me, since I thought this cache was for query cache to speed up retrieval for queries. Why is the memcached server being out of memory causing logs to not be ingested, and what would be the recommendation here? Also, I was under the impression that memcached would evict older entries when newer ones were set that would cause it to exceed memory limits?
Hi @stdiluted , we too have deployed the loki-distributed Helm Chart (including memcached) in our Kubernetes clusters and are seeing these error messages.
Actually the distributed helm chart defines multiple memcached instances. And the one that is related to the ingesters is a different one than the one that is used for the queries.
loki-distributed-memcached-chunks-1 memcached >29 SERVER_ERROR Out of memory during read
loki-distributed-ingester-1 ingester level=error ts=2022-02-22T12:03:00.672944713Z caller=memcached.go:224 msg="failed to put to memcached" name=chunks err="server=redacted:11211: memcache: unexpected response line from \"set\": \"SERVER_ERROR Out of memory during read\\r\\n\""
loki-distributed-ingester-1 ingester level=error ts=2022-02-22T12:03:00.842328067Z caller=memcached.go:224 msg="failed to put to memcached" name=chunks err="server=redacted:11211: memcache: unexpected response line from \"set\": \"SERVER_ERROR Out of memory during read\\r\\n\""
Could it be that your uploads for the logs are too slow? Then the chunks would pile up faster than they can be removed and thus create memory issues in memcached.