Loki basic understanding questions

Morning!

I have been running Loki for a couple of years now in the way that I just send all logs to promtail without much configuration and then do logfmt and filtering in Grafana. The performance of that is fairly poor, e.g. looking at my firewall dashboard for more than now()-6h - now() is all but impossible.

But just the other day, I stumbled over promtail pipelines and I’m wondering now if my performance issues are related to my setup and if the basic idea of loki is to have the feeder, i.e. promtail to at least all the labeling stuff.

One other thing I have problems is that I can’t get the logs from my OpenWRT router into promtail via syslog, neither via TCP nor via UDP. I’m currently proxying those through an rsyslogd on the loki host but that’s ugly… :face_with_head_bandage: :brick: :brick: :brick:

-Stefan

But it is not clear what’s your use case. For example you are searching needle in your logs, then bloom filter will help - Loki 3.0 release: Bloom filters, native OpenTelemetry support, and more! | Grafana Labs
Maybe you have not very smart dashboard implementation, e. g. loading all the logs (so it can be MB, GB of logs) into Grafana, just to have log count graph - simple metric logql query and Grafana will be loading a few kB of data.
So you just complained, but didn’t provide any evidence what’s your case, so there is now way to provide any advice (withiut guessing) how it can be improved.

The same for syslog issue. No details: no tested promtail configuration, no info about syslog format,…

Generally: provide reproducible example what you have

I did not complain. My question was if it has performance advantages to do the labeling in a promtail pipeline at the time of ingesting the logs vs. in a query at the time of querying.

I wrote I tried via TCP and UDP. I don’t know in which format OpenWRT sends the logs - if there are different syslog formats.

Snapping at users is not helpful!

Doc should be your first friend:

You have all config options there. Of course if configured options doesn’t match your syslog source, then it wouldn’t work. But you have always error logs which may help you to find why it’s not working.

If I would have found a solution in the docs, I wouldn’t have posted here.
If there would be any errors in the logs, I would have solved the issue myself or then posted the logs together with my post.

I did solve the OpenWRT issue now by dumping the default logd and changing to syslog-ng on the OpenWRT box.

Main main question is still open, i.e. does it have performance advantages to do the labeling in a promtail pipeline on ingest vs. at query time?

I don’t believe there is any significant performance advantages, certainly not from querying logs. The main reason you’d want to have a limit on labels are mostly to reduce the cardinality. As long as you are somewhat reason able on label usage I don’t think it’s particularly impactful either way. What labels do you apply to your logs by the way?

Loki’s performance, especially query performance, comes from distribution. You want query split and query frontend, and with proper configuration I find the performance acceptable. How are you operating your cluster at the moent?

At the moment, I’m only doing adding a src_ip label and add some geoip labels on ingress via a promtail pipeline:

- job_name: defiant
  syslog:
    listen_address: 0.0.0.0:5140
    listen_protocol: tcp
    idle_timeout: 120s
    label_structured_data: yes
    labels:
      job: "defiant"
  relabel_configs:
    - source_labels: ['__syslog_message_hostname']
      target_label: 'host'
    - source_labels: ['__syslog_message_severity']
      target_label: 'level'
    - source_labels: ['__syslog_message_facility']
      target_label: 'facility'
    - source_labels: ['__syslog_message_app_name']
      target_label: 'appname'
  pipeline_stages:

  - match:
      selector: '{facility="kern",level="warning"}'
      stages:
      - logfmt:
          mapping:
              src_ip: SRC
      - labels:
          src_ip:
      - geoip:
          db: "/etc/promtail/GeoLite2-City.mmdb"
          source: src_ip
          db_type: "city"
      - geoip:
          db: "/etc/promtail/GeoLite2-ASN.mmdb"
          source: src_ip
          db_type: "asn"
      - labeldrop:
          - geoip_postal_code
          - geoip_subdivision_code
          - geoip_subdivision_name
          - geoip_continent_code
          - geoip_continent_name
          - geoip_timezone

The firewall logs which are my primary interest are easily parsed via logfmt, which I currently do at query time:

 [179193.332628] reject wan in: IN=br-wan OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=103.193.80.46 DST=xxx.xxx.xxx.xxx LEN=40 TOS=0x00 PREC=0x00 TTL=241 ID=52666 PROTO=TCP SPT=58102 DPT=7779 WINDOW=53270 RES=0x00 SYN URGP=0

Actually, I moved Loki yesterday from my Xeon with a spinning disk ZFS array to a dedicated RK3588 ARM64 system with NVME SSDs and it got much MUCH snappier! I wouldn’t have thought the difference would be that big because the ZFS on the Xeon has SSD and RAM cache.
I haven’t yet thought about a cluster yet because - at least at the moment - the log volume is very low. Mainly the firewall and 3 more Linux hosts with a few docker apps. I have been running Loki for a few years now but just now had the time to really try to deep dive. My ultimate goal is to become familiar enough with Loki on my private infra that I can comfortably use it professionally.

I will look into cluster best practices. That is a good hint, thanks!