Stage.regex doesn't seem to be working

I’ve been trying to ingest my http logs from haproxy. I’ve setup a regex that should supposedly be able to parse that and create seperate labels for all the fields. I’m sure I’m dumb and have missed something, but here is an example of the haproxy log.

2025-02-10T21:20:16.231041-07:00 ark-lb-02 haproxy[10698]: 10.69.200.80:37206 [10/Feb/2025:21:20:16.217] httpsfront~ Emby/ark-emby-02 0/0/0/12/13 200 31118 - - ---- 19/18/0/0/0 0/0 {node|emby.mysticturtles.com} "GET /Shows/1489112/Seasons HTTP/1.1"

Here is my config.alloy

 livedebugging {
     enabled = true
 }

 logging {
    level  = "debug"
    format = "logfmt"
    
 }

  local.file_match "haproxy_files" {
     path_targets = [{"__path__" = "/var/log/haproxy.log"}]
     sync_period = "5s"
 }

  loki.source.file "log_scrape_haproxy" {
    targets    = local.file_match.haproxy_files.targets
    forward_to = [loki.process.extract_haproxy_logs.receiver]
    tail_from_end = true
  }

  loki.process "extract_haproxy_logs" {
    stage.regex {
      expression = `^(?P<timestamp>[\d\-T\:\.]+-[\d\:\+]+)\s+(?P<hostname>[\w\-\d]+)\s+(?P<process>[\w\-\d]+)\[(?P<pid>\d+)\]:\s+(?P<client_ip>[\d\.]+):(?P<client_port>\d+)\s+\[(?P<request_time>[^\]]+)\]\s+(?P<frontend>[\w\-~]+)\s+(?P<backend>[\w\-\/]+)\s+(?P<timing>(?:-?\d+\/)+-?\d+)\s+(?P<status_code>-?\d+)\s+(?P<response_size>\d+)\s+(?P<unknown1>[-\w]*)\s+(?P<unknown2>[-\w]*)\s+(?P<tcp_flags>[\w\-]+)\s+(?P<connection_info>[\d\/]+)\s+(?P<queue_info>[\d\/]+)\s+{(?P<host_url>[^}]+)}\s+"(?P<http_method>\w+)\s+(?P<url>[^\s]+)\s+(?P<http_version>HTTP\/[\d\.]+)"`
    }
    
    forward_to = [loki.write.grafana_loki.receiver]
  }

  loki.write "grafana_loki" {
    endpoint {
      url = "URL_FOR_LOKI"
    }
  }

I do see the logs inside Loki, but I only see the filename label.

I have been testing the regex https://re2js.leopard.in.ua/ and it does seem to work and select all the fields correctly

After regex you need to use stage.labels to actually set labels, see loki.process | Grafana Alloy documentation.

Also, I would recommend you to not turn values such as request time and client IP into labels.

Thanks for the help there. I was misunderstanding there thinking the regex would create the labels.

Can I ask why you would recommend against the client IP? I can understand the request time.

See Label best practices | Grafana Loki documentation

You should parse logs with LogQL for fields with random values.