Best practice to process and ingest lots of archived log files

rumpfc · March 13, 2025, 7:22pm

Hi everyone.

At work I sometimes get bug reports that have been detected a few days ago (or up to 2 weeks). I often have to phase the same challenges to parse through hundreds of (archived) files to find the correct log file and spot where the bug happened (personal record: 10 .log + 150 .log.gz files of 10 instances of a web service, up to 4GB in total size).

Alloy + Loki help me tons, because they effectively search for the right keywords without manually unzipping a lot of .log.gz files in different subfolders. The problem is that I sometimes have to wait a long time until Alloy parses the right time range for me (I think the longest I ever had to wait was 1h or 1h30m).

I could look for certain file patterns in local.file_match and use ignore_older_than argument, but sometimes I need to look for other occurences on other days.

My Question

What is the best practice to deal with hundreds of log and archived log files in Alloy?

I use a docker-compose setup that also scrapes metrices from Loki and Alloy
I would like to accelerate log ingestion without excluding too many log files
Archived log files are of structure *.log-*.gz (per day log rotation like data-importer.log-20250313.gz)
I use one .alloy config file per web service (because they often use different log patterns)
I want to limit manual adjustments per situation to a minimum, i.e. ignore_older_than

tonyswumac · March 14, 2025, 1:57am

Why not divide the files into groups and run many instances of Alloy in parallel?

The only thing you need to watch out for would be the order of logs, you can’t write to a log stream if newer logs already exist, so you’d need to give each alloy instance a unique label so they don’t conflict with each other.

rumpfc · March 14, 2025, 9:13am

@tonyswumac I’m not worried about “newer logs” issue as Alloy adds the “filename” label which makes the stream unique already (except for log rotation where importer.log turns into importer.log-20250313.gz). But I like the idea in grouping the logs and run multiple Alloy instances.

@scarletios

I can manually adjust that config
I didn’t know you could use regex in local.file_match. Worth a try
Same as @tonyswumac suggested
Label by date? I thought TSDBs are already efficient with date range search?
I want to keep compressed files compressed. Some of them are already 200MB in size. Imagine how big they are when they are uncompressed in my file system

Thanks for your tips. I will definitely take the grouping and parallel instance approach to heart. The Log server already creates groups and sub-groups by server and instance.

Will be a pain to C&P some configs and then mount the specific folder name, but better than letting 1 instance deal with 150 log files by itself.

rumpfc · March 22, 2025, 9:22am

Small update: I could solve it with env. variables to avoid copying files and just change the 1 or 2 lines.

In docker-compose, I set an env. variable called SUB_FOLDER.

alloy-importer-01:
  image: grafana/alloy:latest
  command:  run --server.http.listen-addr=0.0.0.0:12345 /etc/alloy/importer.alloy
  volumes:
    - ./docker/alloy/:/etc/alloy/:ro
    - ./logs/:/tmp/logs/:ro
  environment:
    - SUB_FOLDER=inx-dimporter01

# alloy-importer-02: same with SUB_FOLDER=inx-dimporter02

This env. variable SUB_FOLDER can then be used in my importer.alloy file, and it will target all .log and .log-*.gz files. (I skipped the loki.process part here for simplicity)

// live files //
local.file_match "tmp" {
  path_targets = [
    {"__path__" = string.format("/tmp/logs/importer/%s/importer.log", coalesce(sys.env("SUB_FOLDER"), "*"))},
  ]
}

loki.source.file "files" {
  targets    = local.file_match.tmp.targets
  forward_to = [loki.write.endpoint.receiver]
}

// compressed files //
local.file_match "compressed_tmp" {
  path_targets = [
    {"__path__" = string.format("/tmp/logs/importer/%s/archive/importer.log-*.gz", coalesce(sys.env("SUB_FOLDER"), "*"))},
  ]
}

loki.source.file "compressed_files" {
  targets    = local.file_match.compressed_tmp.targets
  forward_to = [loki.write.endpoint.receiver]

  decompression {
    enabled       = true
    initial_delay = "5s"
    format        = "gz"
  }
}

Topic		Replies	Views
Grafana Alloy/loki log ingestion Grafana Loki loki , alloy	21	879	May 8, 2025
Logs sending to Loki Grafana Alloy loki , alloy	1	219	April 30, 2025
Logs are not being sent to Loki in current format, adding different logs to same file uploads new lines only Configuration loki , grafana , alloy	2	389	December 20, 2024
Grafana Alloy - Reading line routine Grafana Alloy	2	72	March 20, 2025
Alloy not sending data until the service is restarted Grafana Alloy	5	731	June 25, 2025

Best practice to process and ingest lots of archived log files

My Question

Related topics