Syslog-ng through telegraf into Loki

I’m working on a very low end system, and I was looking for ways to remove influxdb from my log pipeline, as it can be a bit heavy, especially on memory use.

I discovered Loki - ‘like prometheus but for logs’. Telegraf has a Loki output plugin, so I went for that.

Basically it works, I’m getting logs in my grafana dashboard, but they’re pretty messed up.

I get a ‘labels’ value, for example { “__name”: “syslog”, “appname”: “grafana-server”, “facility”: “daemon”, “host”: “iota”, “hostname”: “iota”, “severity”: “info”, “source”: “127.0.0.1” }.

I also get a ‘Line’ value:
facility_code=“3” severity_code=“6” timestamp=“1687865491000000000” procid=“186713” message=“logger=context userId=1 orgId=1 uname=admin t=2023-06-27T14:31:31.799826078+03:00 level=error msg=“Request error” error=“net/http: abort Handler”” version=“1”

Now the very short documentation for this Loki output plugin recommends to use ‘| logfmt’ on the lokiql query. This doesn’t work for most of these logs:
{ “Request”: “”, “error”: “LogfmtParserErr”, “error_details”: “logfmt syntax error at pos 206 : unexpected ‘"’”, “__name”: “syslog”, “appname”: “grafana-server”, “facility”: “daemon”, “facility_code”: “3”, “host”: “iota”, “hostname”: “iota”, “message”: “logger=context userId=1 orgId=1 uname=admin t=2023-06-27T14:31:31.799826078+03:00 level=error msg=”, “procid”: “186713”, “severity”: “info”, “severity_code”: “6”, “source”: “127.0.0.1”, “timestamp”: “1687865491000000000” }

The message value has a msg=“” in it and it seems to break it.

Alternatively, I could use a pattern operator to grab the message, however, the values in the Line column are all randomly ordered, sometimes it begins with a procid, other times timestamp etc.

Is there any way I could sort these Lines to have the same pattern? If anyone has used the Loki output in telegraf I’d appreciate a nudge to ge these logs sorted out.

  1. What does your actual log line looks like?

  2. You may be sending data in the influx format, have you tried using json format? You can change the format by adding data_format = "json" to your output configuration.

I’m sending logs from syslog-ng into telegraf and from telegraf into loki.

Here’s some more example logs:

facility_code="3" severity_code="6" timestamp="1687940161000000000" procid="186713" message="logger=context userId=1 orgId=1 uname=admin t=2023-06-28T11:16:01.868027368+03:00 level=info msg="Request Completed" method=GET path=/api/live/ws status=-1 remote_addr=192.168.1.152 time_ms=2 duration=2.166476ms size=0 referer= handler=/api/live/ws" version="1"

version="1" facility_code="4" severity_code="5" timestamp="1687939459000000000" procid="1167135" message=" pi : TTY=pts/0 ; PWD=/home/pi ; USER=root ; COMMAND=/usr/bin/systemctl restart telegraf"

facility_code="3" severity_code="6" timestamp="1687938543000000000" procid="186713" message="logger=cleanup t=2023-06-28T10:49:03.411119477+03:00 level=info msg="Completed cleanup jobs" duration=2.789475ms" version="1"

The only interesting information in the line is the message, everything else is available in the labels.

As for your second point, I don’t think the loki output plugin has a data_format option. Telegraf didn’t give me any errors when I tried it, but the log line is still in string format.

Here’s some very basic configs to replicate:

destination telegraf_local { syslog("127.0.0.1" port(6514)); };
filter main { level(info..emerg); };
log { source(src); filter(main); destination(telegraf_local); };
[[inputs.syslog]]
server = "tcp://:6514"

[[outputs.loki]]
domain = "http://127.0.0.1:3100"
endpoint = "/loki/api/v1/push"
timeout = "15s"
data_format = "json" ## doesn't appear to do anything

If that’s how your logs appear in Loki, you can use regexp to grab the message, then parse the message with logfmt, like so:

{SELECTOR} 
  | regexp `message="(?P<message>.*)"`
  | line_format "{{.message}}"
  | logfmt

I still think it may be beneficial for you to potentially look into a real log agent, so that you can end clean logs to Loki by parsing for the message part and keep that part only.

2 Likes

Thanks for the reply again, I actually got that far on my own as well, but with multiple nested quotes in the message field it struggles to find the closing quote and leaves some extra stuff in the “message”. My regex abilities are unfortunately limited to searching for similar solutions and I got stuck there.

message="logger=context userId=1 orgId=1 uname=admin t=2023-06-28T19:12:34.26018977+03:00 level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=192.168.1.152 time_ms=3 duration=3.948619ms size=150 referer="http://iota:3000/d/d37763d3-b23c-4976-9bd0-6c2ca596753b/syslog-loki?editPanel=12&orgId=1" handler=/api/ds/query" version="1" facility_code="3" severity_code="6" timestamp="1687968754000000000" procid="186713"

turns into

logger=context userId=1 orgId=1 uname=admin t=2023-06-28T19:12:34.26018977+03:00 level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=192.168.1.152 time_ms=3 duration=3.948619ms size=150 referer="http://iota:3000/d/d37763d3-b23c-4976-9bd0-6c2ca596753b/syslog-loki?editPanel=12&orgId=1" handler=/api/ds/query" version="1" facility_code="3" severity_code="6" timestamp="1687968754000000000" procid="186713

with error:
"__error__": "LogfmtParserErr", "__error_details__": "logfmt syntax error at pos 344 : unexpected '\"'"

The message part ends at handler=/api/ds/query"

By a real log agent you mean using something like promtail? I did try that, and while the logs themselves were clean, it seemed like I had to do a fair bit of manual setup to get to where I am with telegrafs default tags/labels.

I see what you mean. I don’t think there is a good solution for this, unfortunately, because the inner double quotes really should be escaped, otherwise it’s just not a valid format. You have two ways to fix this I’d say (maybe three):

  1. As suggested in my previous posts, look into a real log agent, so that you can parse the logs properly, keep the message part only, and turn rest into labels, so you end up with clean logs in Loki.

  2. Figure out a way to escape the inner double quotes. I am not sure what sort of input you use, but Grok input data format | Telegraf 1.9 Documentation MAY be able to do it.

  3. If you really have to use telegraf, you can also choose to output to local file and then use a log agent to pick them up. This allows you to not replace telegraf, but still be able to deal with ingestion parsing.

Actually, this regex might work for you:

message="(?P<message>[^\"]* msg=\"[^\"]*\"[^\"]* referer=\"[^\"]*\"[^\"]*)"

You’ll have to hardcode any potential occurrence of any key/value that may potentially have double quotes.

Yeah that does work, but only for those lines that have a msg and a referer. All the other lines just go blank. But yeah hardcoding it like that is not great.

I’ll look into your suggestions, thanks for the help!

1 Like

How can I clean up my logs pre ingest to loki using promtail? Or do I have to use some log parser for that in between

Right now parser <date> <temp> <_> is working beautifully for us but not sure how performant that is.

Thanks

You can use the output stage to set your log string, see output | Grafana Loki documentation.

Generally I try to keep my logging pipeline as clean and lean as possible, and I try not to change the original log string. Loki’s performance is pretty good for parsing logs, especially if you have log splitting and query frontend configured correct for query distribution.

1 Like