Hi,
I am wondering if I can extract several info from the sample XML log and forward it with label to Loki. I have such XML sample:
<setEntityDataFault xmlns="http://example.com/FakeService/v1" version="1" source="FakeInputModule" created="2025-04-01T05:02:56Z" code="ServerError" subcode="ReferenceNotFound" requestId="">
<reason>entity_set with key "id=XYZ123" not found in EntityData.contracts</reason>
<sourceMessage><![CDATA[
<setEntityDataRequest xmlns="http://example.com/FakeService/v1" created="2025-04-01T05:02:00" source="mock_load" version="1">
<entity entityId="FAKE001" type="PERSON" status="ACTIVE">
<attribute name="birthCountry" value="XX" />
<attribute name="birthDate" value="1990-01-01" />
<attribute name="city" value="FakeTown" />
<clearRelations />
<relation type="employment" refId="ABC123" validFrom="2022-01-01" validTo="2023-01-01" />
<relation type="assignment" refId="ZZZ999" validFrom="2022-01-01" />
</entity>
</setEntityDataRequest>
]]></sourceMessage>
</setEntityDataFault>
For example I would like to extract reason
with the same label name, and entityId
with ID label, is it possible to do that with alloy configuration? I tried to that with regex but without success, always I see one huge XML log entry in Loki.
Are you trying to extract something then set as label, or are you trying to extract something and use it as the log message?
Please share the configuration that you’ve tried.
There is bunch of regex to parse, maybe this is a bottleneck:
loki.process "filter_logs" {
forward_to = [loki.write.grafana_loki.receiver]
stage.match {
selector = "{filename=~\".+\"} |~ \"<setEntityDataFault.*(code=\\\"ServerError\\\"|subcode=\\\"ReferenceNotFound\\\")\""
action = "keep"
}
stage.regex {
expression = "reason>(?P<reason>[^<]+)</reason>"
}
stage.regex {
expression = "source=\\\"(?P<source>[^\"]+)\\\""
}
stage.regex {
expression = "code=\\\"(?P<code>[^\"]+)\\\".*subcode=\\\"(?P<subcode>[^\"]+)\\\""
}
stage.regex {
expression = "<entity entityId=\\\"(?P<entity_id>[^\"]+)\\\""
}
stage.regex {
expression = "<!\\[CDATA\\[(?P<source_message>.*)</setEntityDataRequest>"
}
stage.template {
source = "error_type"
template = "{{ .code }}::{{ .subcode }}"
}
stage.labels {
values = {
reason = "reason"
source = "source"
error_type = "error_type"
entity_id = "entity_id"
source_message = "source_message"
}
}
stage.limit {
rate = 10
burst = 50
}
}
And of course each time I do not see any label to choose in Loki UI which I specified in stage.labels.
Ok, based on that useful lecture: Label best practices | Grafana Loki documentation I think I should forget about source_message and entity_id (too dynamic values) as there might thousands of them. Anyway still the rest I would like to extract and set them as label.
Yes, according to best practice you don’t want to use labels that can potentially have unbounded values which would create high cardinality. But you can still use structured metadata if you wish to extract something from the logs, structured metadata does not contribute to cardinality.
Anyway, based on your example log this configuration worked for me. Couple of things to note:
- I am not sure if your logs come in one big line or multiple lines, so use the multiline stage accordingly.
- In your configuration you have a stage.match with keep action, I am not sure what that’s for.
test.log:
<setEntityDataFault xmlns="http://example.com/FakeService/v1" version="1" source="FakeInputModule" created="2025-04-01T05:02:56Z" code="ServerError" subcode="ReferenceNotFound" requestId="">
<reason>entity_set with key "id=XYZ123" not found in EntityData.contracts</reason>
<sourceMessage><![CDATA[
<setEntityDataRequest xmlns="http://example.com/FakeService/v1" created="2025-04-01T05:02:00" source="mock_load" version="1">
<entity entityId="FAKE001" type="PERSON" status="ACTIVE">
<attribute name="birthCountry" value="XX" />
<attribute name="birthDate" value="1990-01-01" />
<attribute name="city" value="FakeTown" />
<clearRelations />
<relation type="employment" refId="ABC123" validFrom="2022-01-01" validTo="2023-01-01" />
<relation type="assignment" refId="ZZZ999" validFrom="2022-01-01" />
</entity>
</setEntityDataRequest>
]]></sourceMessage>
</setEntityDataFault>
Alloy config:
loki.process "filter_logs" {
forward_to = [loki.write.grafana_loki.receiver]
stage.multiline {
firstline = `^\<setEntityDataFault`
max_wait_time = "10s"
}
stage.regex {
expression = `source=\"(?P<source>[^\"]+)\"`
}
stage.regex {
expression = `code=\"(?P<code>[^\"]+)\"`
}
stage.regex {
expression = `subcode=\"(?P<subcode>[^\"]+)\"`
}
stage.template {
source = "error_type"
template = "{{ .code }}::{{ .subcode }}"
}
stage.labels {
values = {
source = "source",
error_type = "error_type",
}
}
}
result: