Labels cannot be deleted with loki.process

Hello, i have this loki.process:

loki.process "syslog" {
        forward_to = [loki.write.syslog.receiver]

        stage.logfmt {
          mapping = {src_ip = "src_ip"}
        }

        stage.logfmt {
          mapping = {dst_ip = "dst_ip"}
        }

        stage.logfmt {
          mapping = {dst_port = "dst_port"}
        }

        stage.logfmt {
          mapping = {log_subtype = "log_subtype"}
        }

        stage.drop {
          source = "msg"
          value  = "failed to decode logfmt"
        }

        // Extract src components
        stage.regex {
          expression = "src=(?P<src_ip>\\d+\\.\\d+\\.\\d+\\.\\d+):(?P<src_port>\\d+)(?::(?P<src_if>[^:\\s]+))?(?::(?P<src_fqdn>[^:\\s]+))?"
        }
      
        // Extract dst components
        stage.regex {
          expression = "dst=(?P<dst_ip>\\d+\\.\\d+\\.\\d+\\.\\d+):(?P<dst_port>\\d+)(?::(?P<dst_if>[^:\\s]+))?(?::(?P<dst_fqdn>[^:\\s]+))?"
        }
      
        // Remove src=... and dst=... from the message
        stage.replace {
          expression = "src=[^ ]+"
          replace    = ""
        }
      
        stage.replace {
          expression = "dst=[^ ]+"
          replace    = ""
        }

        stage.drop {
          expression = "dst=[^ ]+"
        }
        /*
        // Set labels
        stage.labels {
          values = {
            src_ip    = "",
            dst_ip    = "",
            src_port  = "",
            dst_port  = "",
            src_if    = "",
            dst_if    = "",
            src_fqdn  = "",
            dst_fqdn  = "",
          }
        }
      */

        // Drop empty labels
      stage.match {
        selector = "{src_if=\"\"}"
          stage.label_drop {
            values = ["src_if"]
          }
      }
    
      stage.match {
        selector = "{dst_if=\"\"}"
          stage.label_drop {
            values = ["dst_if"]
          }
      }
    
      stage.match {
        selector = "{src_fqdn=\"\"}"
          stage.label_drop {
            values = ["src_fqdn"]
          }
      }

      stage.match {
        selector = "{dst_fqdn=\"\"}"
          stage.label_drop {
            values = ["dst_fqdn"]
          }
      }
      
      //stage.match {
      //  selector = "{dst=~\".+\"}"
      //  action   = "drop"
      //}

/*
      stage.match {
        selector = "{dst_fqdn=\"\"}"
          stage.label_drop {
            values = ["dst_fqdn"]
          }
      }
      
      stage.match {
        selector = "{src_if=\"\"}"
        action   = "drop"
      }
    
      stage.match {
        selector = "{dst_if=\"\"}"
        action   = "drop"
      }
    
      stage.match {
        selector = "{src_fqdn=\"\"}"
        action   = "drop"
      }
    
      stage.match {
        selector = "{dst_fqdn=\"\"}"
        action   = "drop"
      }
      
*/

      stage.label_drop {
        values = ["src", "dst", "src_if", "dst_if", "src_fqdn", "dst_fqdn", "src_ip", "dst_ip", "dst_port", "log_subtype"]
      }

I have tried in vain to delete the labels dst and src, but even empty labels such as dst_if do not work. I have tried several things but without success.
Is it possible that stage.structured_metadata, although it is at the end, is processed before the whole delete action and therefore it does not work?

here is an example log (except for deleting everything works so far):
Jan 3 13:45:36 192.168.5.1 id=firewall sn=000SERIAL time=“2007-01-03 14:48:06” fw=1.1.1.1 pri=6 c=262144 m=98 msg=“Connection Opened” n=23419 src=2.2.2.2:36701:WAN dst=1.1.1.1:50000:WAN proto=tcp/50000

  1. I don’t see strucured_metadata in your config, did you perhaps crop it?
  2. Are you trying to drop the dst and src labels, or are you trying to drop logs?

Tony, thank you for your quick reply.
Yes, you were right, something was cut off incorrectly.
The task is actually simple.
After parsing dst and src with regex, the labels in the logfile dst and src should be deleted, as well as empty labels that are created when parsing with regex, e.g. dst_if or dst_fqdn.
How can i achieve this?

I have tried different variants, no chance, one could think that loki.process does not process the batch according to the order, but that immediately after the values are published they are tranverted to stage.structured_metadata before they are actually deleted, just a guess.

loki.process "syslog" {
        forward_to = [loki.write.syslog.receiver]

        /*
        stage.logfmt {
          mapping = {src = "src"}
        }

        stage.logfmt {
          mapping = {dst = "dst"}
        }
        */

        stage.logfmt {
          mapping = {src_ip = "src_ip"}
        }

        stage.logfmt {
          mapping = {dst_ip = "dst_ip"}
        }

        stage.logfmt {
          mapping = {dst_port = "dst_port"}
        }

        stage.logfmt {
          mapping = {log_subtype = "log_subtype"}
        }

        stage.drop {
          source = "msg"
          value  = "failed to decode logfmt"
        }

        // Extract src components
        stage.regex {
          expression = "src=(?P<src_ip>\\d+\\.\\d+\\.\\d+\\.\\d+):(?P<src_port>\\d+)(?::(?P<src_if>[^:\\s]+))?(?::(?P<src_fqdn>[^:\\s]+))?"
        }
      
        // Extract dst components
        stage.regex {
          expression = "dst=(?P<dst_ip>\\d+\\.\\d+\\.\\d+\\.\\d+):(?P<dst_port>\\d+)(?::(?P<dst_if>[^:\\s]+))?(?::(?P<dst_fqdn>[^:\\s]+))?"
        }
      
        // Remove src=... and dst=... from the message
        stage.replace {
          expression = "src=[^ ]+"
          replace    = ""
        }
      
        stage.replace {
          expression = "dst=[^ ]+"
          replace    = ""
        }

        stage.drop {
          expression = "dst=[^ ]+"
        }
        /*
        // Set labels
        stage.labels {
          values = {
            src_ip    = "",
            dst_ip    = "",
            src_port  = "",
            dst_port  = "",
            src_if    = "",
            dst_if    = "",
            src_fqdn  = "",
            dst_fqdn  = "",
          }
        }
      */

        // Drop empty labels
      stage.match {
        selector = "{src_if=\"\"}"
          stage.label_drop {
            values = ["src_if"]
          }
      }
    
      stage.match {
        selector = "{dst_if=\"\"}"
          stage.label_drop {
            values = ["dst_if"]
          }
      }
    
      stage.match {
        selector = "{src_fqdn=\"\"}"
          stage.label_drop {
            values = ["src_fqdn"]
          }
      }

      stage.match {
        selector = "{dst_fqdn=\"\"}"
          stage.label_drop {
            values = ["dst_fqdn"]
          }
      }
      
      //stage.match {
      //  selector = "{dst=~\".+\"}"
      //  action   = "drop"
      //}

/*
      stage.match {
        selector = "{dst_fqdn=\"\"}"
          stage.label_drop {
            values = ["dst_fqdn"]
          }
      }
      
      stage.match {
        selector = "{src_if=\"\"}"
        action   = "drop"
      }
    
      stage.match {
        selector = "{dst_if=\"\"}"
        action   = "drop"
      }
    
      stage.match {
        selector = "{src_fqdn=\"\"}"
        action   = "drop"
      }
    
      stage.match {
        selector = "{dst_fqdn=\"\"}"
        action   = "drop"
      }
      
*/

      stage.label_drop {
        values = ["src", "dst", "src_if", "dst_if", "src_fqdn", "dst_fqdn", "src_ip", "dst_ip", "dst_port", "log_subtype"]
      }
        
        // Add structured metadata
        stage.structured_metadata {
          values = {
            src_ip   = "src_ip",
            dst_ip   = "dst_ip",
            dst_port = "dst_port",
            src_if   = "src_if",
            dst_if   = "dst_if",
            dst_fqdn = "dst_fqdn",
            src_fqdn = "src_fqdn",
          }
        }
        
      }

I tried your configuration with the sample log provided, and it wasn’t working for me at all. Your configuration also seems to be a bit too complicated. Several things:

  1. You don’t actually need all the logfmt stages, your sample log doesn’t have any of the fields (src_ip, dst_ip, dst_port, log_subtype).
  2. You already commented out the labels stage, so any of the label_drop stage after that is not doing anything useful.
  3. You have a stage to replace src and dst, my personal preference has always been to not alter the original logs unless there is a very good reason to do it, so I’d recommend against doing this.

This is what worked for me. I tested using the following sample log (one fabricated to test the fqdn and if regex):

Jan 3 13:45:36 192.168.5.1 id=firewall sn=000SERIAL time=“2007-01-03 14:48:06” fw=1.1.1.1 pri=6 c=262144 m=98 msg=“Connection Opened” n=23419 src=2.2.2.2:36701:WAN dst=1.1.1.1:50000:WAN proto=tcp/50000
Jan 3 13:45:36 192.168.5.1 id=firewall sn=000SERIAL time=“2007-01-03 14:48:06” fw=1.1.1.1 pri=6 c=262144 m=98 msg=“Connection Opened” n=23419 src=2.2.2.2:36701:WAN:some_fqdn dst=1.1.1.1:50000:WAN:some_fqdn proto=tcp/50000

Config:

loki.process "process_logs" {
  forward_to = [<FORWARDER>]

  stage.regex {
    expression = `src=(?P<src_ip>\d+\.\d+\.\d+\.\d+):(?P<src_port>\d+)(?::(?P<src_if>[^:\s]+))?(?::(?P<src_fqdn>[^:\s]+))?`
  }

  stage.regex {
    expression = `dst=(?P<dst_ip>\d+\.\d+\.\d+\.\d+):(?P<dst_port>\d+)(?::(?P<dst_if>[^:\s]+))?(?::(?P<dst_fqdn>[^:\s]+))?`
  }

  // Add structured metadata
  stage.structured_metadata {
    values = {
      src_ip   = "src_ip",
      dst_ip   = "dst_ip",
      dst_port = "dst_port",
      src_if   = "src_if",
      dst_if   = "dst_if",
      dst_fqdn = "dst_fqdn",
      src_fqdn = "src_fqdn",
    }
  }
}

Thank you for the good analysis.
I have one more thing to say. Apart from the example log which comes from a Sonicwall, there are also logs from Sophos firewalls that contain exactly these labels, hence the mapping with logfmt. As already mentioned, it works with both log types Sonic / Sophos except for the deletion problem. I would like to understand why regardless of whether you manipulate or delete the original log data (I agree with you) you cannot delete these labels.

device_name="SFW" timestamp="2025-04-17T15:37:34+0200" device_model="X" device_serial_id="1111111" log_id="010202601001" log_type="Firewall" log_component="Invalid Traffic" log_subtype="Denied" log_version=1 severity="Information" fw_rule_id="N/A" nat_rule_id="0" fw_rule_type="NETWORK" ether_type="IPv4 (0x0800)" src_ip="15.17.234.131" src_country="R1" dst_ip="18.19.8.73" dst_country="R1" protocol="TCP" src_port=40620 dst_port=5274 hb_status="No Heartbeat" message="Could not associate packet to any connection." app_resolved_by="Signature" app_is_cloud="FALSE" qualifier="New" log_occurrence="1"

Do you have a reliable way to differentiate between those two logs? If so I’d recommend you to parse enough to determine which is which, set a label, then parse them separately with stage.match.

For example:

<do so parsing to set firewall_type>

stage.match {
    selector = "{firewall_type=\"sonicwall\"}"

    <REST OF STAGES>
}

stage.match {
    selector = "{firewall_type=\"sophos\"}"

    <REST OF STAGES>
}

Thanks again for the help.
Unfortunately, there is no unique identifier in the log files of the different firewalls. To me this looks very much like a bug, it must be possible, regardless of whether it makes sense to manipulate the labels, to delete them using my example.
My example is really very straightforward and should not be a problem for alloy.

I see in your other log there is a field device_name="SFW", can this be used as the differentiator?

If not, you can simply pass both logs through both logfmt and regex stage, and the empty key won’t get set in structured metadata.

Config I used:

stage.logfmt {
    mapping = {
      dst_ip = "dst_ip",
      dst_port = "dst_port",
      src_ip = "src_ip",
    }
  }

  stage.regex {
    expression = `src=(?P<src_ip>\d+\.\d+\.\d+\.\d+):(?P<src_port>\d+)(?::(?P<src_if>[^:\s]+))?(?::(?P<src_fqdn>[^:\s]+))?`
  }

  stage.regex {
    expression = `dst=(?P<dst_ip>\d+\.\d+\.\d+\.\d+):(?P<dst_port>\d+)(?::(?P<dst_if>[^:\s]+))?(?::(?P<dst_fqdn>[^:\s]+))?`
  }

  // Add structured metadata
  stage.structured_metadata {
    values = {
      src_ip   = "src_ip",
      dst_ip   = "dst_ip",
      dst_port = "dst_port",
      src_if   = "src_if",
      dst_if   = "dst_if",
      dst_fqdn = "dst_fqdn",
      src_fqdn = "src_fqdn",
    }
  }

Result:

If this is not your intent please share what outcome you’d like to see.

Hi Tony,
in your second example (Sonic) the label dst_fqdn is present, but this is not always the case. If the label is missing in the log, an empty one is automatically generated by parsing with regex. How do i deal with this?

There is nothing inherently wrong with sending an empty label to Loki. Loki will just ignore a label if it has empty value.