Hello all.
I’m recently encountering a weird issue that I’m not sure how to debug. I’m currently on the free Grafana Cloud tier as I’m using it to monitor a couple of machines that I have in my homelab.
Nothing fancy, just some self hosted applications. For the last couple of months the log usage has skyrocketed and I finish the 50Giga within a couple of days.
I went to check the grafanacloud-[instanceName]-usage-insights
datasource with the following query {instance_type="logs"} |= "path=write"
and noticed an eccessive number of errors. Here too examples of the type of entries I find.
caller=manager.go:49 component=distributor path=write insight=true msg="write operation failed" details="Ingestion rate limit exceeded for user XXXX (limit: 0 bytes/sec) while attempting to ingest '55' lines totaling '5908' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased" org_id=XXX
caller=manager.go:49 component=distributor path=write insight=true msg="write operation failed" details="entry for stream '{container=\"caddy\", instance=\"XXX\", job=\"integrations/docker\", service_name=\"caddy\", stream=\"stderr\"}' has timestamp too old: 2024-05-13T12:43:36Z, oldest acceptable timestamp is: 2024-05-26T12:51:39Z" org_id=XXXX
Now the entries seems pretty self-explanatory and it looks like my instances are generating logs too fast, the date of which is too old, they get batched up and then rate limited.
However, I have absolutely no idea how to debug this to understand what is going on.
I’m mainly running containers in both instances but nothing fancy, as you can see one of the if caddy that I use as reverse proxy for reaching other internal containers.
Any idea what I could try?