Any tips about Loki Ingester chunk de-duplication when RF=3 or more?

Hi. We are using Grafana loki-distributed.

  • Loki Version : 2.9.4
  • Storage : tsdb single store (S3)
  • RF : 3
  • Ingester : 3EA
  • No Cache (No Memcached config)

Recently, i found that Loki ingester does not de-duplicate well when RF(Replication Factor) is more than 1.

When Ingester does not de-duplicate chunk logs and flush to duplicated chunks to s3, even though querier is doing deduplication process, there will also be a capacity problem because unnecessary data is flushing into s3, and I think there will be a considerable load on querier as well because querier fetches all data and inspects duplicated or not.

1. Current Situation

: Here is my chunk flush bytes and chunk de-duplicate bytes

As you can see from the picture, deduplication is not performing well in ingester.

So i searched about deduplication process and i saw docs and source code.

2. Thus, Did some test about memcached chunk cache enabled

So i set for test Loki-distributed environment and enabled chunk memcached.

    - -m 5000
    - -I 10m
    - -vvv


      {{- if .Values.memcachedChunks.enabled }}
          writeback_size_limit: 4GB
          writeback_buffer: 20000
          enabled: false
          consistent_hash: true
          addresses: dnssrv+_memcached-client._tcp.{{ include "loki.memcachedChunksFullname" . }}.{{ .Release.Namespace }}.svc.{{ }}

But It does not effective well.

Chunk deduplication bytes so low.

And this is memcached metrics.

In conclusion, I don’t know why the deduplication doesn’t work well.

Please tell me how to perform the deduplication well in ingester

We don’t run Loki with replication factor of 3, so please take my comment with a grain of salt.

Deduplication based on chunk hash is tricky, because it relies on chunks from all ingesters to be identical. But given that ingesters are started at different time, and that chunks are cut based on rough size they most likely aren’t going to be the same. Loki deals with this by cutting chunks based on time instead, with an additional configuration of minimum utilization of expected chunk size. See How Loki Reduces Log Storage | Grafana Labs for a good explanation.

So make sure sync_period and sync_min_utilization are configured accordingly. Also double check your chunk size and chunk idle to make sure they aren’t being written too often. Also share your Loki config as well, if it’s still not working.

Thanks for replying. Really appreciate. @tonyswumac

First i have a question.

Question #1 ) You told me that you are not using loki with RF=3, If so, What RF are you using ?

Even if Loki team recommends RF=3, but if the deduplication problem is not resolved, I think it would be more advantageous to operate with RF=1.

And this is my test results

Test #1 )

      sync_period: 15m
      sync_min_utilization: 0.5


chunk_idle_period, chunk_max_age, chunk_target_size, .. = Default settings

#      chunk_retain_period: 1m
#      chunk_idle_period: 2h
#      chunk_target_size: 1536000
#      max_chunk_age: 2h
#      chunk_block_size: 262144
#      chunk_encoding: snappy

  • Stored Bytes : 3.3GiB
  • Deduped Bytes : 850MiB

Because it’s RF=3, I expected Deduplicated Bytes to be around 2~4GiB when Stored Bytes is 2GiB.

Deduplication has been handled a little better, but it hasn’t gone as expected yet.

Test #2 )

      sync_period: 15m
      sync_min_utilization: 0.2

      chunk_retain_period: 1m
      chunk_target_size: 30000000
      chunk_idle_period: 2h
      max_chunk_age: 2h
      chunk_block_size: 262144
      chunk_encoding: snappy

  • Stored Bytes : 2.5GiB
  • Deduped Bytes : 640MiB

If the chunk is flushed before sync_period(ex. 15m) by the chunk_target_size,
I expected that deduplication would not perform well,
so I raised the chunk_target_size quite a lot.

But this also didn’t have much effect.

You might try posting your question in the community slack as well, perhaps someone more knowledgeable than I or a developer could chime in.

We use only replication factor 1. The primary driver for us is cost. Obviously it’s against recommendation, but we feel that with simple scalable mode and WAL, and that we very rarely lose a node to the point that WAL becomes unrecoverable, it’s an acceptable risk.

It helped me a lot. Thank you so much

I’m also considering changing to RF=1

Other than WAL, are there any additional know-how or tips to ensure that chunk data is not lost as much as possible when ingester restarts?

I’d also recommend not scaling down the writer containers automatically. Because there is only one copy of logs at any given time, we decided to not scale down writers automatically, and only do it manually when needed so that we can ensure that the chunks are flushed properly.

Thanks for replying.

as far as I’m understand,

for example (when restarting loki writer)

1. Pause log transmission from log driver (ex. fluentbit or promtail or kafka consumer, …)
2. then proceed force flushing chunks
3. and then restart the loki writer

Is it right that I understand correctly?

If it is correct, I would like to change the Log Pipeline as follows.

Log shipper → Kafka → Kafka Consumer (ex. HTTP Connector ?) → Loki (with RF=1)

Thanks to you, my curiosity has been solved a lot!