Log format over Grafana Alloy

nopik · May 16, 2024, 1:30pm

Hello,

So, I’m having some problems routing my logs through Alloy to Loki (in Grafana Cloud). Let me write some assumptions first and then follow with what config I’m using:

First of all, I like my logs to be structured and rich. Specifically, to include high-cardinality fields like requestId. All my software is carefully exporting many useful informations to logs using extra fields like that.

Ideally, I would like to just use all fields as Loki labels, and have a string message field being displayed as the message in Loki. However, as far as I know, Loki does not like setup like that, with high-cardinality fields. So, I’m going with putting my whole log entry into JSON and using | json formatter in Loki. That is pretty annoying, and I would like for Loki to be better than that, but it is what it is. As far as I understand, it is much less optimal, as e.g. if I want to filter by requestId, query must load lots of data. I know that there is some new ‘structured log’ support in Loki, and it is interesting, but probably not supported yet by Alloy’s loki.write?

So, when my software pushes directly to Loki, I see the logs, I do parse json as mentioned above, happy days.

But then, my deployment in production consists of multiple containers, and there is no point for each and every container to keep separate connection to Grafana Cloud. So, I started with otel-collector and configured Grafana OTLP connectivity. On top of that I’m also sending metrics and traces, so single collector makes much more sense, as it also do batching, memory limiting and having whole OTEL pipeline support which is very nice.

That kind of worked, I think, but I did not tested it very well: even with few metrics (like five), Grafana Cloud was responding 429 Too Many requests to my otel-exporter. I haven’t seen any relevant setting around that, my conclusion that OTLP support in Grafana is just extremely bad.

But, Grafana has this Alloy agent which is widely recommended on the website, so it was natural replacement for otel-collector. So, I did the change, and things are kind of working, with Alloy being the only part of my infra which talks to Grafana Cloud directly. I have this pipeline (basically followed tutorial from Alloy docs), which does loki.source.api → otelcol.receiver.loki → mem limiter / batcher → otelcol.exporter.loki → loki.write. At first glance I thought my config is finished, but then I’ve realized that all my logs are being wrapped with extra { "body": "..." }. Apparently otelcol.receiver.loki is ‘converting to OTLP format’ which basically means slapping JSON around everything, even if I export simple string, it gets wrapped, too. So, now in Loki I have to use | json | line_format "{{ .body }}" | json which is nonsensical.

For the time being, I introduced workaround in Alloy config, so loki.source.api writes directly to loki.write, which avoids the double JSON nesting problem. But, I’m missing the OTLP pipeline stuff, obviously.

When I was just shipping my logs to ElasticSearch, none of this was happening, ES just ingested the json and allowed me to query the way I want it. Now I’m trying to switch to Loki (basically to use Grafana Cloud in its entirety, as previously my code wasn’t using metrics/tracing too much), I see huge degradation, unfortunately.

Am I missing something / doing something wrong?

Maybe I should just ship the high-cardinality labels to Loki? The amount of data is not that big (<1 GB/day, and I do not query it that often).

Are there better ways to have better experience here? Any suggestions / recommendations?

nopik · May 16, 2024, 1:37pm

Also, another aspect, I tried to add otelcol.processor.transform in my Alloy config to unpack the body part, something like:

log_statements {
   context = "log"
    statements = [
     `set(body, ParseJSON(body)["body"])`,
   ]
}

but apparently the transform comes too early: body was not double wrapped at that moment yet, it was probably added at otelcol.exporter.loki stage. Moreover, I’m not really looking forward to situation when Alloy receives my JSON, packs it to be inside another JSON, then on next step unpacks it, and then sends away. Sounds like wasting tons of CPU unnecessarily.

But, if any solution like that was working, I would probably consider it.

rodneybr · September 6, 2024, 9:45pm

@nopik Did you find any good solution to this?

nopik · September 6, 2024, 11:39pm

No. I read somewhere about Loki’s support for structured metadata (What is structured metadata | Grafana Loki documentation) but haven’t tried it so far. I just live in misery and unpack every log via | json, without being able to efficiently query/filter by label. Loki is doing really poor job there, I’m afraid.

tonyswumac · September 7, 2024, 1:35am

Quite a list of questions, I’ll try to address them all as much as I can:

Structured metadata is part of loki.process, see loki.process | Grafana Alloy documentation

If you are using Alloy, is there a reason to even pipe your logs through OTLP pipeline? Why not go from loki.source.api to loki.process then to loki.write directly, and save yourself some trouble?

What sort of degradation are you seeing, exactly?

tonyswumac · September 7, 2024, 1:43am

How exactly does this make you misery? This is how Loki works.

I see people compare Elasticsearch to Loki all the time, and I used to run a couple of Elasticsearch clusters myself, so I understand. When we switched it’s quite obvious Elasticsearch query performance was better, but the trade off was worth it because if you run and use Loki properly the performance is still quite good (we run only three parallel readers and we get roughly 4GB/second query performance, quite respectable honestly), and the cost of maintaining Loki was maybe 1/3 of the cost to maintain ES for us.

Loki’s performance (especially on the query side) comes largely from parallel processing. It’s better to have more count of small queriers than less count of big queriers. I don’t use Grafana Cloud personally (we run Loki ourselves), but I am sure the same applies. If you have less than 1GB of logs per day and you find using Loki miserable chances are something isn’t how it should be. We are piping probably 2TB of logs per day (there are other community members with much bigger clusters than we) and I’ve been honestly very happy with it.

rodneybr · September 9, 2024, 5:41pm

Thanks for replying to this old thread

yosiasz · September 9, 2024, 6:13pm

with high-cardinality fields come great reshponshibilithies

have you looked at | parse <_> <requestId> etc

please post a sample json file, obfuscated is you have some sensitive/confidential data

Topic		Replies	Views
Best way to debug data flow through Alloy? Grafana Alloy agent	1	268	June 9, 2024
Logs are not being sent to Loki in current format, adding different logs to same file uploads new lines only Configuration loki , grafana , alloy	2	13	December 20, 2024
Alloy log processing Grafana Alloy	4	200	August 21, 2024
Archlinux journals to Loki Grafana Loki loki	2	261	April 16, 2024
Basic alloy log config Grafana Alloy loki	3	480	August 19, 2024

Log format over Grafana Alloy

Related topics