Discussion that is more chatty than issues but still important.
Looks like the endpoint struct will consume up to ~64 files instead of the 1 it should. This is due to an internal minimum capacity, there is an additional option to override that so working on that fix and test now.
Hi there, @freak12techno from Github here.
As of now, do you need me to try something to test stuff out? Not sure yet what else can I play with to help debugging it and making it more stable.
(Also, feel free to ping me if you need something.)
Apologies the next few weeks and last are spotty due to offsites, conferences and vacation. Going to get a fix in this week, the PR is currently in review.
PR is being merged as we speak, would love any feedback on it Fix fake minimum capacity by mattdurham · Pull Request #1982 · grafana/alloy · GitHub
So, we did some tests on our devices, and it’s persisting data as expected - tried to disable the internet access on the device, then wait for some time, then shut it down, then enable internet, and it works as expected. Appreciate that a lot, thanks!
One question @mattdurham: when should we expect a new release that includes these changes? Currently we’re using a custom Docker image based on the one in grafana/alloy repo, just to build from sources, and it takes like an hour to build, so we’d prefer using the pre-built image that includes it, if possible.
Grafana Alloy README says the minor release is being pushed every 6 weeks, and the latest 1.4.x release was at the end of September, so I assume the 1.5.x release should be pushed really soon, please correct me if I’m wrong.
@mattdurham okay we got some more info: we got some devices that we tested it on; one of the devices was offline for more than 3 days, it seemed to persist all of this data, but sending it takes ages. here’s the prometheus.write.queue
config:
prometheus.write.queue "default" {
endpoint "default"{
url = env("PROMETHEUS_HOST")
bearer_token = env("PROMETHEUS_TOKEN")
parallelism = 1
}
// Keep 1 week of data, in case it wasn't sent.
// More on WAL and its internals:
// https://grafana.com/docs/alloy/latest/reference/components/prometheus/prometheus.remote_write/#wal-block
ttl = "168h"
persistence {
batch_interval = "10s"
}
}
I think that’s because of the parallelism = 1
part, but there’s a lack of documentation on it, and I think other people might face it as well later. so I have 2 questions:
- how can we set it up so it would send these metrics faster once the device is back online?
- can we have some better docs on this feature and how to configure it? it also might be that I missed it if it’s already done