Hello all,
I wonder if anyone might be able to help me figure out how to backfill old log data into Loki. I have read the article on ingesting out-of-order logs as well as the relevant Loki documentation. I think that I have my instance configured so that it will accept older logs (as long as they are in a new stream).
I got to the point where I believe that I successfully inserted the old logs into Loki, but I could never see them in a query.
I was getting an error before I added a unique label to make a new stream. After adding the unique label, this error went away:
Oct 07 17:48:40 loki.waas.lab loki[35899]: level=warn ts=2024-10-07T17:48:40.461638229Z caller=grpc_logging.go:76 method=/logproto.Pusher/Push duration=90.166µs msg=gRPC err="rpc error: code = Code(400) desc = entry with timestamp 2024-09-13 00:05:33.839976 +0000 UTC ignored, reason: 'entry too far behind, entry timestamp is: 2024-09-13T00:05:33Z, oldest acceptable timestamp is: 2024-10-07T16:48:19Z',\nuser 'fake', total ignored: 1 out of 1 for stream: {cv_sp_num=\"1\", host=\"bloader.waas.lab\", job=\"splog-to-loki\", service_name=\"splog-to-loki\", site=\"ztl\", source=\"splog\", subsystem=\"cv\", system=\"field\"}"
Once I got past that and Loki apparently accepted the logs, I expected to see them in a query after a few hours passed (max_chunk_age). However, I when I checked after 12 hours I could not see the log entries in my query.
Any ideas or things I should check or try?
Thanks,
B. J. Potter
If you saw 204
when injesting your old logs then it should’ve been ok. Assuming your cluster is working correctly, it’s hard to say what the problem is. Couple of things I’d check:
- Do you see other logs around the same time frame?
- What is your configured value for
query_ingester_within
?
- If you ingest the logs without old timestamp (create another unique label, of course, for testing purpose), does it work?
- After ingesting the old logs do you notice additional chunk files being written into your backend storage at the time frame of your old logs?
- There is a gap in the time frame I am trying to fill and I never see anything in the gap. Normal new logs are still coming in though.
- It’s not in my configuration, so I guess the default of 3 hours
- Do you mean change the timestamp to a current time? I haven’t tried that. I can.
- I haven’t monitored that. I suspect maybe the additional chunk files are not being written.
I’ll get a test system set up and try these suggestions, thank you.
Hi,
When you are querying the old logs you ingested, did you make sure to add the different label you had to add in order to get around the ingestion error?
I didn’t add that one specifically, but the logs should have come up given the labels and time range in my query. I can try adding that one specifically though.
I got this to work after setting up the problem in a new test bed environment!
I set it up with new VMs (one for Loki, one for Grafana, and one for the loader). I am using the same process to generate the logs for a couple of different dates. I did the filling on 2024-11-04 using log data first from 2024-10-28 and then from 2024-09-01. This eventually worked, but I am surprised at how long it took to be able to see the data in Grafana. I did the original filling 2024-11-04T20:06 and checked for results the next day, 2024-11-05T16:00. I did not see anything at all in Grafana. I expected to at least see the 2024-10-28 entries since they would have been the newest. This morning, 2024-11-06T14:00, I was able to see both the new and old entries!
I did see some new files eventually, but they were not there one day after. I specifically saw these new tsdb files:
/data/loki/
├── compactor
├── rules
├── store
│ ├── index
│ ├── tsdb-cache
│ │ ├── index_19966
│ │ │ └── fake
│ │ ├── index_19967
│ │ │ └── fake
│ │ │ └── 1730751845475664815-compactor-1725148960500-1725149241501-ea45c87f.tsdb
│ │ ├── index_19968
│ │ │ └── fake
│ │ ├── index_20024
│ │ │ └── fake
│ │ │ └── 1730751845464284853-compactor-1730159574268-1730159868342-4f02f5f7.tsdb
Is there something I should adjust in my Loki configuration to optimize the way Loki writes these?
Thanks for the help, I hope to replicate this in our production system when our lab comes back up!