Hello,
So, I’m having some problems routing my logs through Alloy to Loki (in Grafana Cloud). Let me write some assumptions first and then follow with what config I’m using:
First of all, I like my logs to be structured and rich. Specifically, to include high-cardinality fields like requestId
. All my software is carefully exporting many useful informations to logs using extra fields like that.
Ideally, I would like to just use all fields as Loki labels, and have a string message
field being displayed as the message in Loki. However, as far as I know, Loki does not like setup like that, with high-cardinality fields. So, I’m going with putting my whole log entry into JSON and using | json
formatter in Loki. That is pretty annoying, and I would like for Loki to be better than that, but it is what it is. As far as I understand, it is much less optimal, as e.g. if I want to filter by requestId
, query must load lots of data. I know that there is some new ‘structured log’ support in Loki, and it is interesting, but probably not supported yet by Alloy’s loki.write
?
So, when my software pushes directly to Loki, I see the logs, I do parse json as mentioned above, happy days.
But then, my deployment in production consists of multiple containers, and there is no point for each and every container to keep separate connection to Grafana Cloud. So, I started with otel-collector
and configured Grafana OTLP connectivity. On top of that I’m also sending metrics and traces, so single collector makes much more sense, as it also do batching, memory limiting and having whole OTEL pipeline support which is very nice.
That kind of worked, I think, but I did not tested it very well: even with few metrics (like five), Grafana Cloud was responding 429 Too Many requests
to my otel-exporter
. I haven’t seen any relevant setting around that, my conclusion that OTLP support in Grafana is just extremely bad.
But, Grafana has this Alloy agent which is widely recommended on the website, so it was natural replacement for otel-collector
. So, I did the change, and things are kind of working, with Alloy being the only part of my infra which talks to Grafana Cloud directly. I have this pipeline (basically followed tutorial from Alloy docs), which does loki.source.api
→ otelcol.receiver.loki
→ mem limiter / batcher → otelcol.exporter.loki
→ loki.write
. At first glance I thought my config is finished, but then I’ve realized that all my logs are being wrapped with extra { "body": "..." }
. Apparently otelcol.receiver.loki
is ‘converting to OTLP format’ which basically means slapping JSON around everything, even if I export simple string, it gets wrapped, too. So, now in Loki I have to use | json | line_format "{{ .body }}" | json
which is nonsensical.
For the time being, I introduced workaround in Alloy config, so loki.source.api
writes directly to loki.write
, which avoids the double JSON nesting problem. But, I’m missing the OTLP pipeline stuff, obviously.
When I was just shipping my logs to ElasticSearch, none of this was happening, ES just ingested the json and allowed me to query the way I want it. Now I’m trying to switch to Loki (basically to use Grafana Cloud in its entirety, as previously my code wasn’t using metrics/tracing too much), I see huge degradation, unfortunately.
Am I missing something / doing something wrong?
Maybe I should just ship the high-cardinality labels to Loki? The amount of data is not that big (<1 GB/day, and I do not query it that often).
Are there better ways to have better experience here? Any suggestions / recommendations?