Help understanding log ingestion and searching with Syslog>Promtail>Loki>Grafana

Hello folks,
I have an OpenStack/Ceph “mini cluster” at home. I’ve got Prometheus and Grafana running nicely with lots of metrics and dashboards. Next step is logs. My main driver is being able to quickly search firewall logs, and eventually other logs. I have a Sophos firewall but searching the logs in its UI is very slow.

So far, I’ve set up Sophos to send logs to syslog-ng, and syslog-ng to the promtail syslog receiver
The logs come into Loki/Grafana no problem, however the only labels I get are Job and Host. This makes searching by e.g. Source IP label impossible.

- job_name: syslog
  syslog:
    listen_address: 0.0.0.0:1514
    listen_protocol: tcp
    idle_timeout: 60s
    label_structured_data: yes
    labels:
      job: "syslog"
  relabel_configs:
    - source_labels: ['__syslog_message_hostname']
      target_label: 'host'

What I’m trying to achieve is a dashboard log view where I can set filters for source IP, destination IP, protocol, port etc.

I can basically achieve this using logfmt e.g.:
{job="syslog"} | logfmt | log_type="Firewall" | src_ip=~"$src_ip" | dst_ip=~"$dst_ip" | dst_port=~"$dst_port"

So I guess my questions are:

  • Am I using the right tools for the job? It all seems a bit clunky.
  • Is logfmt the right way to do this or can I get promtail to ingest more syslog fields (structured data?) as native labels?
  • What’s the best way to do filters, just text box variables and lots of pipes in the query?

Edit
It also seems there is a limit of 5000 for the records returned which means I can’t search for an event that happened e.g. in the last 6 hours because there are way too many logs.

  1. You are correct in that if you wish to filter by src_ip or dst_ip you’d need to parse the logs, and if your logs are in the right format for logfmt then yes you’d use logfmt.

  2. You can also use promtail to parse the logs and extract information. I would recommend you to use structured metadata for something like IP addresses.

  3. You can change the limit in your data source configuration. I wouldn’t set this to be too big of a value though.

Now as to the feeling of being clunky. Having used ElasticSearch before migrating to Loki, I had the same feeling before. But at the end it’s just different way of doing things. The thing that I grew to appreciate in Loki is how much freedom you have in terms of the data you send to Loki (as opposed to something like ElasticSearch). If you end up adding structured metadata field for IP it’ll be easier I am sure.

Also I had the same problem with the 5k limit at first, too. But it turned out to be not that big of a problem. Usually you don’t need more than 5K logs in your browser (you most certainly won’t be able to read them all at once). So it comes down to narrowing down the search window and criteria to find the things you are looking for, rather than have a lot of logs shown all at once (to me at least).

1 Like

Thanks Tony. I think I’m getting the hang of it now. But I’m still a bit confused by the 5000 limit. It appears to be the limit of items returned from the label query. Is that correct? For example. My Grafana query looks like this:

{job="syslog"} | logfmt | log_type="Firewall" | dst_ip=~"103.49.172.212"
And my time range is set for last 7 days.

My understanding is that this will return a max of 5000 records based on this part of the query: {job="syslog"} . logfmt and the dst_ip filters are then applied to this set of results. If this is correct, then I’m not sure how to search for that IP address over the past 7 days.

Using the pipeline to convert dst_ip into a label might solve my searchability problem but would blow up the Loki index, possibly making it unusable? What am I missing here?

Hi folks, a quick update on where I got to, which might help other noobs like me.

I found no way around the issue described in my last post. I don’t doubt that Loki fits some use cases superbly well, but not mine.

I deployed an ELK stack using docker and it’s exactly what I needed. I’m ingesting 1M firewall logs per day and can search any field going back 30+ days in an instant.

For me a combination of Prometheus/Grafana for monitoring, and ELK for logging is just right.