Historical data not immediately queryable

The issue
I’m working on getting historical data into Loki. But when I enter some historical loglines (let’s say from 2021-12-09) into the DB, it does not show up immediately when I query it using eg logcli.
After lots of fiddling, I’ve found that I can at least force it to show up by adding a final logline in the same stream with the data of now().

My setup

  • Docker, running Loki latest (2.4.1), with Grafana as dash, and logcli (latest) for barebones querying.
  • Linux Mint 20.2

How to reproduce

  1. Enter a couple of loglines with a timestamp sometime in the past (eg 2021-12-09T00:00:00.000000+00:00), with the static label ‘application=hist_test’
  2. Verify with logcli that label 'hist_tes;t is available: logcli labels application
  3. See that no loglines are available: logcli query '{application="hist_test"}' --since 200h
  4. Enter one more logline with timestamp from now (at the moment, that would be something like 2021-12-14T14:58:14.488133+00:00)
  5. Verify that this last logline is now available, but the older ones not yet: logcli query '{application="hist_test"}' --since 200h
  6. Wait a couple of seconds (10? 20? 60? I don’t know yet).
  7. Verify that all loglines are now available: logcli query '{application="hist_test"}' --since 200h

Repeated querying have shown me that sometime during the waiting period (step 6), the last logline disappears again, only to show up together with the older loglines.

home$ ./logcli query '{application="hist_test"}' --since 500h
2021-12-14T14:26:42+01:00 {} [INFO] this is my testing line
home$ ./logcli query '{application="hist_test"}' --since 500h
2021-12-14T14:26:42+01:00 {} [INFO] this is my testing line
home$ ./logcli query '{application="hist_test"}' --since 500h
home$ ./logcli query '{application="hist_test"}' --since 500h
home$ ./logcli query '{application="hist_test"}' --since 500h
<snip>
home$ ./logcli query '{application="hist_test"}' --since 500h
home$ ./logcli query '{application="hist_test"}' --since 500h
home$ ./logcli query '{application="hist_test"}' --since 500h
home$ ./logcli query '{application="hist_test"}' --since 500h
2021-12-14T14:26:42+01:00 {} [INFO] this is my testing line
2021-12-09T01:00:00+01:00 {} [INFO] 127.0.0.1:59640 - - [09/Dec/2021:00:00:00 &#43;0000] "Historical log line"

I have the idea that this might have to do with max_chunk_size, max_chunk_age, out of order data, etc… But not sure yet. Anybody that can shed any light on this, please let me know.

1 Like

I have done a small test, and it feels like it is tied to the out-of-order window (which is half an hour by default).
I entered like 1000 historical log lines into the DB, and they only showed up half an hour later.

Should I expect this, or is this a bug?

1 Like

Hello, I had the same issue and luckily I found your post which helped me. I think it’s about chunk_idle_period, which is default 30m. I changed that to 1m and my historical logs are queryable after 1 minute and some seconds. I hope it helps!

Hi Raurora, I’m glad you found a workaround. But I’m afraid this is not a real solution, only a workaround. Yes, it appears to be tied to the chunk_idle period, but this should not be the case.

  1. When I ingest log data from today, this data is immediately queryable. This is what I would expect: the log data is already in memory, so it should be available for querying. Compressing it into chunks is only a mechanism for Loki to get data out of memory and onto disk or Object Storage, thus greatly reducing the memory footprint.

  2. Setting your chunk_idle period to 1m will result in a lot of small data files. Depending on how fast logs are flowing in, they will end up much smaller than what is configured in chunk_target_size. And that will impact your querying performance.

So, notwithstanding the fact that you find a way to circumvent the issue, this will cause other problems.

In my case, since I still have lots of data flowing in, I don’t care too much about running a bit behind the clock.