Why is it impossible to normalize labels?

Hello,

Im switching from promtail to Alloy and was hoping everything gets easier, but its quite the contrary, for now Alloy is even harder to use than promtail, because no LLM is able to help with Alloy as there is by far not enough documentation and discussions to train the models and Alloy is not logging any errors at all… what are logs for if the most important errors are just not logged in the first place. That makes debugging a nightmare. Im working since 6 days without success on my alloy config but still nothing works. Sorry if my posting is a little grumpy but there are just so many things wrong.

Ok my biggest headache is currently, that it is impossible to normalize level-labels. I wanted to boil down all my logs to 5 canonical levels, using loki.relabel. But due to the fact that you cannot use lookahead regex in Go RE2 and that you cannot skip the execution of rules, its impossible to run a rule on “everything that was not previously matched”.

Im wondering, why isnt there a stage that is doing exactly that already? Isnt that a thing that like everybody needs? Its one of the most important features for log-collection in the first place, without standardized labels, any further processing cant work.

Am I the only one who needs this?

Can you provide some example logs and what you are looking to achieve?

Hi, thats the thing, ALL logs. No matter which format, so no example needed.

I want a stage that normalizes all possible log-levels to a list of predefined levels. If you already have a label level with some potential content this is possible, but if there is no label or if the level is something strange, than you cannot create a catch-all stage that captures all those logs. That is impossible.

Here is my current loki.relabel config:


//------------------------------------------------------
//normalize all incoming levels, but because this stage does not support a unknown value (no regex lookahead possible, no catchall supported)
//we need to set unknown to ALL logs before they receive a different level
//------------------------------------------------------
loki.relabel "normalize_levels" {
  forward_to = [loki.write.local.receiver ]

  // normalize all known variants of debug
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(debug|dbg|trace|verbose)\\s*"
    target_label  = "level"
    replacement   = "debug"
    action        = "replace"
  }

  // normalize all known variants of info
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(info|information|notice|informational)\\s*"
    target_label  = "level"
    replacement   = "info"
    action        = "replace"
  }

  // normalize all known variants of warning
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(warn|warning|wrn)\\s*"
    target_label  = "level"
    replacement   = "warning"
    action        = "replace"
  }

  // normalize all known variants of error
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(err|error|fail|failure|severe|exception)\\s*"
    target_label  = "level"
    replacement   = "error"
    action        = "replace"
  }

  // normalize all known variants of critical / fatal
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(crit|critical|fatal|panic|emerg|alert)\\s*"
    target_label  = "level"
    replacement   = "critical"
    action        = "replace"
  }
}


//------------------------------------------------------
//if the level is set to unknown from a previous stage, try to find a level in the message
//------------------------------------------------------
loki.process "extract_levels" {
  forward_to = [loki.relabel.normalize_levels.receiver]

  //only process unknown or empty labels, if level contains something else, we will continue either way

  stage.match {
    selector = "{level=~\"unknown|^$\"}"

    // Nest the regex stage inside this match block.
    stage.regex {
      expression = "(?i).*?(?P<found_level>debug|trace|verbose|info|notice|warn|warning|wrn|err|error|fail|failure|crit|critical|fatal|panic|emerg|alert|exception).*"
    }
    stage.labels {values={found_level = "found_level"}}
    stage.match {
      selector = "{found_level=~\".+\"}"
      //assign the extracted level
      stage.labels { values = { level = "found_level" }}
    }
  }
}

Currently I have set a static label for ALL input scrape processes, so every source will have a static label “unknown” and if there is a level in the expression it will be overwritten. This works currently like 90% accurate (because stage.multiline is very buggy). But you need to keep track of every input meticulously.

As of now for my 17 input logs I have 600 lines of Alloy-config file. Thats almost more compared to promtail even though the concept of alloy is totally different (and should reduce simplify things)

Sorry for my rant, I was expecting a normalize function to be THE most basic function of a log-scraper. Because Log-levels are THE most important part of the log-message. I just dont understand that, am I the only one in need of log-levels? If I was developing Alloy, stage.relabel.normalize would have been one of the first priorities in the backlog.

It looks like you’ve fixed 90% of your problem, I’d say that’s pretty good. I will try to answer some of your concerns and questions in your post, but I don’t think I personally have an answer that’ll satisfy you. Hopefully others are more creative than I am.

To address your question of “Why is it impossible to normalize labels”. I am not sure what exactly you expect here. There is no way anybody or any program can arbitrarily decide what would be considered “critical” or “warning”. For example, in your configuration you have emerg categorized as “critical”, how is Alloy to know that? I could be wrong, but I don’t think any logging agent will do that for you automatically without some configuration. You’d have much better luck if you try to normalize this from your source, i.e. enforcing a uniformed level string from your log source.

My personal recommendations would be:

  1. If a uniformed level string is critically important to you, try to normalize from the source as much as you can. Deal with the rest in your pipeline, and hopefully to reduce the lines of configurations needed.
  2. Or, depending on how you’ll use the level label, you can decide to not parse for level at all in logging pipeline, and instead set that label during query time. It’s possible to craft a query to set the level label based on if/else logic at query time, but it won’t look very pretty either, but at least it’ll be a couple of more lines of query strings instead of 600 lines of pipeline configurations.
1 Like

What do I expect: Having the tools at hand to easily do this and not having to trial and error for days to find a hacky workaround, that will require a lot of time to understand for whoever will read the config.

a ready to be used normalizer stage can be super easy for the user, in which I just write a list of output levels and any string that I want to see in it. It can even be set to default to well known levels like I used above (because I used the most commonly levels so my template is actually a good default for most usecases) so that the user just puts a “stage.level_normalize” into any block and is done.

After many tries I found a configuration that works for almost all logs, quite the journey:

the first one is the general log-level normalizer:

//------------------------------------------------------
//normalize all incoming levels, but because this stage does not support a unknown value (no regex lookahead possible, no catchall supported)
//we need to set unknown to ALL logs before they receive a different level
//------------------------------------------------------
loki.relabel "normalize_levels" {
  forward_to = [loki.write.local.receiver ]

  // normalize all known variants of debug
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(debug|dbg|trace|verbose)\\s*"
    target_label  = "level"
    replacement   = "debug"
    action        = "replace"
  }

  // normalize all known variants of info
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(info|information|notice|informational)\\s*"
    target_label  = "level"
    replacement   = "info"
    action        = "replace"
  }

  // normalize all known variants of warning
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(warn|warning|wrn)\\s*"
    target_label  = "level"
    replacement   = "warning"
    action        = "replace"
  }

  // normalize all known variants of error
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(err|error|fail|failure|severe|exception)\\s*"
    target_label  = "level"
    replacement   = "error"
    action        = "replace"
  }

  // normalize all known variants of critical / fatal
  rule {
    source_labels = ["level"]
    regex         = "(?i)\\s*(crit|critical|fatal|panic|emerg|alert)\\s*"
    target_label  = "level"
    replacement   = "critical"
    action        = "replace"
  }
}


//------------------------------------------------------
//if the level is set to unknown from a previous stage, try to find a level in the message
//------------------------------------------------------
loki.process "extract_message_levels" {
  forward_to = [loki.relabel.normalize_levels.receiver]

  //only process unknown or empty labels, if level contains something else, we will continue either way

  stage.match {
    selector = "{level=~\"unknown|^$\"}"

    // Nest the regex stage inside this match block.
    stage.regex {
      expression = "(?i).*?(?P<found_level>debug|trace|verbose|info|notice|warn|warning|wrn|err|error|fail|failure|crit|critical|fatal|panic|emerg|alert|exception).*"
    }
    stage.labels {values={found_level = "found_level"}}
    stage.match {
      selector = "{found_level=~\".+\"}"
      //assign the extracted level
      stage.labels { values = { level = "found_level" }}
    }
  }
}


//------------------------------------------------------
//if still no level is found, look for compunt levels, that would be wrongly assigned in the next extract levels from message
//that is the reason we have 3 process stages before the normalizer, to account for the missing docker levels, and compounds
//and then we look for common words
//------------------------------------------------------
loki.process "compound_levels" {
  forward_to = [loki.process.extract_message_levels.receiver]

  stage.match {
    selector = "{level=~\"unknown|^$\"}"

    // capture compound phrases
    stage.regex {
      expression = "(?i).*?(?P<found_level>(soft failure|no error)).*"
    }
    stage.labels {
      values = { found_level = "found_level" }
    }

    // map "soft failure" → warning
    stage.match {
      selector = "{found_level=~\"soft failure\"}"
      stage.static_labels {
        values = { level = "warning" }
      }
    }

    // map "no error" → info
    stage.match {
      selector = "{found_level=~\"no error\"}"
      stage.static_labels {
        values = { level = "info" }
      }
    }
  }
}




//------------------------------------------------------
//if the level is set to unknown from a previous stage, try to find the keyword level in the message
//------------------------------------------------------
loki.process "extract_levels" {
  forward_to = [loki.process.compound_levels.receiver]

  stage.match {
    selector = "{level=~\"unknown|^$\"}"

    stage.regex {
      expression = "(?i)\\blevel[=:]?(?P<explicit_level>debug|trace|verbose|info|notice|warn|wrn|err|fail|crit|fatal|panic|emerg|alert|exception)"
    }
    stage.labels {
      values = { found_level = "explicit_level" }
    }
    stage.match {
      selector = "{found_level=~\".+\"}"
      //assign the extracted level
      stage.labels { values = { level = "explicit_level" }}
    }
  }
}

And here is a special part for nginx-access logs:

//-----------------------------------nginx-access--------------------------------
loki.process "nginx_access_logs" {
  forward_to = [loki.process.extract_levels.receiver]
  stage.static_labels {
    values = {
      job = "nginx_access",
      service = "nginx_access",
      level = "unknown",
    }
  }
  stage.regex { expression = `^(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|[0-9a-fA-F:]+)\s\-\s(?P<user>[a-zA-Z0-9\-]+)\s\[(?P<ts>[^\]]+)\]\s+"(?P<request>[^"]+)"\s+(?P<status>\d+)\s+(?P<msg>.*)$` }
  stage.drop {
    source = "ts"
    // If 'ts' was not extracted, the log-line is not in standard format (likely a multiline log, and because stage.multiline does not work we have to drop it)
    expression = "^$"
  }
  stage.labels {
    values = {
      status = "status",
    }
  }
  // ----------------------------------------------------
  // STATUS CODE TO LOG LEVEL MAPPING
  // ----------------------------------------------------
  // Info (Success - 2xx and Redirection - 3xx)
  stage.match {
    selector = "{status=~\"2[0-9]{2}|3[0-9]{2}\"}" // Matches any 2xx (OK) or 3xx (Redirect) code
    stage.static_labels { values = { level = "info" } }
  }
  // Debug (Informational - 1xx or any status not matched above)
  stage.match {
    selector = "{status=~\"1[0-9]{2}|^$|000|.*\"}" // Matches 1xx codes or any unmatched/non-standard status
    stage.static_labels { values = { level = "debug" } }
  }
  // Warning (Client Errors - 4xx)
  stage.match {
    selector = "{status=~\"4[0-9]{2}\"}" // Matches any 4xx code (e.g., 404, 403, 400)
    stage.static_labels { values = { level = "warning" } }
  }
  // Error (Any other 5xx that is not matched later from crit, and 499 (Client Closed Connection))
  stage.match {
    selector = "{status=~\"5[0-9]{2}|499\"}" // Matches any 5xx code (e.g., 501, 505) AND 499
    stage.static_labels { values = { level = "error" } }
  }
  // Critical (5xx, but only fatal server errors like 500/502/504)
  stage.match {
    selector = "{status=~\"500|502|503|504|507|508|511\"}" // Key server failure codes
    stage.static_labels { values = { level = "critical" } }
  }
  stage.timestamp {
    source = "ts"
    format = "02/Jan/2006:15:04:05 -0700"
    location = "Europe/Berlin"
  }
  //put all regex capture groups into the final message
  stage.template {
    source = "msg"
    template = "HOST: {{ .host }} | USER: {{ .user }} | REQUEST: {{ .request }} | MSG: {{ .msg }}"
  }
  stage.output { source = "msg" }