Parse fields in OPNsense (or pfSense) filterlog syslog

I’m new to Alloy/Loki but I’ve gotten Alloy to ingest OPNsense filterlogs in syslog format, and put them into Loki. The filterlog message format is comma separated (no field names) and I would like to create some Grafana dashboards from this data. I’ve read the Alloy documentation and searched for an existing solution. Would some kind stranger please validate my pipeline approach or post their existing OPNsense/pfSense pipeline script for filterlogs?

My approach would be:

  1. stage.regex to extract the protocol (i.e., TCP/UPD messages are in a different format from ICMP)
  2. stage.match on the protocol label and then use the correct regex for each protocol to extract the labels I need (e.g., sourceIP, destinationIP, etc.)
  3. stage.geoip to get location data on IP addresses
  4. stage.structured_metadata to write the labels I need into the message for searching in Grafana
  5. stage.label_drop to get rid of any temporary labels
1 Like

what you got seems reasonable. there are dozens of ways to skin the :cat2: Could you please post a sample log file? obfuscate if necessary

Thanks for taking a look. Any tips or suggestions are appreciated. Here is a sample TCP full message:

<134>1 2024-10-27T08:49:34-07:00 hostname filterlog 58921 - [meta sequenceId="113299"] 12,,,7ca0bdbea8e636fba2e984923ed67866,igb0,match,block,in,4,0x0,,238,7685,0,none,6,tcp,40,79.1.1.1,108.1.1.1,54280,26203,0,S,579073575,,1024,,
1 Like

which data points do you want to scrape into Loki

option 1, ingest as raw log using following config I call it lazy approach, my fav)

logging {
  level  = "info"
  format = "logfmt"
}

loki.source.file "files" {
  targets    = [
    {__path__ = "/tmp/syslog.log", "color" = "pink"},
  ]
  forward_to = [loki.process.pfsense.receiver]
}

loki.process "pfsense" {
    forward_to = [loki.echo.debug.receiver, loki.write.default.receiver]

    stage.static_labels {
      values = {
        job = "pfsense",
      }
    }
}

loki.echo "debug" { }

loki.write "default" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}

then use loki query to parse

{job="pfsense"}
| pattern `<pri> <datetime> <hostname> <_> <id> `

<_> ignores a label

I like echo section because you can vet things out to see how things are working instead of tweaking blindly things

image

1 Like

Thanks for the suggestion, and especially the tip on echo. The data is coming in via a syslog feed (loki.source.syslog) and I can query based on datetime, etc. in Grafana. For the dashboards, I will need to query on a field like sourceIP (“79.1.1.1” in my sample record). From what I’ve learned about Loki, I don’t want that field as a label. My thinking was to use structured_metadata to create “sourceIP=79.1.1.1” as part of the full message, and then I could search on that field in Grafana.

what is the thing you learned that you don’t want to label that?

Source IP is a very large set of values. I thought labels should be limited to low cardinality fields.

1 Like

Here is what I would recommend:

  1. Parse as little as possible.
  2. I think it’s a good idea to separate ICMP and other logs, this can be a label.
  3. Any IP should be structured metadata, not label.
  4. Parse IP, then get GEO info and store into structured metadata is a good idea.

The rest of parsing should be done in Loki. Of course your logs aren’t very informative in terms of defining which column is what, so you’ll have to hard code your query a bit (pattern filter should do the trick here, if not certainly regex filter would).

2 Likes

Thank you @yosiasz and @tonyswumac. Alloy/Loki has so many options that it can be overwhelming. I really appreciate your suggestions and for pointing me in the right direction.

1 Like

I’ve a similar configuration (filterlog entries sent by syslog), but experiencing problems parsing the second group of attributes (which varies depending on ip version). It gets the first set of attributes just fine, and i set some labels that i do see. However, I’m unable to see any attributes pulled from the match blocks that follow the first set of labels set in processing. I suspect the issue is the selector, but i haven’t found a good way to debug that. Does anyone have any suggesstions? Here’s the relevant section of my config:

// grab rsyslog data
loki.source.syslog "local" {
  forward_to = [loki.process.raw_syslog.receiver]

  listener {
    address  = "0.0.0.0:514"
    protocol = "udp"
  }

  relabel_rules = loki.relabel.syslog.rules
}

// build the relabel rules used by the source.syslog component
loki.relabel "syslog" {
  forward_to = [loki.write.local.receiver]

  rule {
    source_labels = ["__syslog_message_severity"]
    target_label  = "level"
  }
  rule {
    source_labels = ["__syslog_message_facility"]
    target_label  = "facility"
  }
  rule {
    source_labels = ["__syslog_message_hostname"]
    target_label  = "hostname"
  }
  rule {
    source_labels = ["__syslog_message_app_name"]
    target_label  = "application"
  }
}

loki.process "raw_syslog" {
  forward_to = [loki.write.local.receiver]

  stage.match {
    selector = "{application=\"filterlog\"}"
    pipeline_name = "filterlog_enrichment"

    // filterlog entries are comma delimited and difficult to decode by humans, repackage for easy reading
    // Ref: https://github.com/opnsense/ports/blob/master/opnsense/filterlog/files/description.txt

    // this first collection of attributes are common to all entries and are placed first
    // grab them and send the rest into a temporary remainder attribute
    stage.regex {
      expression = join (
        [
          "^(?<rulenr>\\w*)",
          "(?<subrulenr>\\w*)",
          "(?<anchorname>\\w*)",
          "(?<label>\\w*)",
          "(?<interface>\\w*)",
          "(?<reason>\\w*)",
          "(?<action>\\w*)",
          "(?<dir>\\w*)",
          "(?<ipversion>\\w*)",
          "(?<remainder>.*)$",
        ],
        ",",
      )
    }

    // add in labels from the previous steps to be used for stages below
    stage.labels {
      values = {
        action = "",
        dir = "",
        interface = "",
        ipversion = "",
        reason = "",
      }
    }

    // parse out IPv4 data
    stage.match {
      selector = "{ipversion=\"4\"}"
      pipeline_name = "IPv4 Processing"

      stage.regex {
        source = "remainder"
        expression = join (
          [
            "^(?<tos>\\w*)",
            "(?<ecn>\\w*)",
            "(?<ttl>\\w*)",
            "(?<id>\\w*)",
            "(?<offset>\\w*)",
            "(?<flags>\\w*)",
            "(?<protonum>\\w*)",
            "(?<protoname>\\w*)",
            "(?<length>\\w*)",
            "(?<src>\\w*)",
            "(?<dst>\\w*)",
            "(?<remainder>.*)$",
          ],
          ",",
        )
      }
    }

    // parse out IPv6 data
    stage.match {
      selector = "{ipversion=\"6\"}"
      pipeline_name = "IPv6 Processing"

      stage.regex {
        source = "remainder"
        expression = join (
          [
            "^(?<class>\\w*)",
            "(?<flow>\\w*)",
            "(?<hoplimit>\\w*)",
            "(?<protoname>\\w*)",
            "(?<protonum>\\w*)",
            "(?<length>\\w*)",
            "(?<src>\\w*)",
            "(?<dst>\\w*)",
            "(?<remainder>.*)$",
          ],
          ",",
        )
      }
    }

    stage.labels {
      values = {
        protoname = "",
        protonum = "",
      }
    }
 }
}

// send processed data
loki.write "local" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}