Hi,
We have several devices running Alloy, managed via systemd. These devices are typically powered off using a physical power button. When the button is pressed, systemd detects the event and initiates a system shutdown.
However, we’ve observed that during this shutdown sequence, Alloy is no longer able to send logs to the remote endpoint. Making us lose (somtimes crucial messages when the system is close to shutting down). My guess is that this is not an issue with alloy but rather an incorrect/missing setting in the systemd unit.
It might e.g. be possible that DNS or network connectivity is being terminated too early in the process. This behavior is reproducible using the grafana-alloy.service
unit available in the Grafana repository.
If you install Grafana bare metal then it’s controlled by systemd. Systemd will send shutdown signals to all processes, and the order of which can’t really be controlled.
You can try setting dependency (for example have alloy depends on your application therefore systemd should try to shutdown your app before alloy) and see if that helps.
We are quite certain we have a correct dependency chain. But it looks like e.g. dns is already not available anymore, while alloy still wants to send out logs.
You can test that hypothesis pretty easily by using IP instead of hostname in your alloy configuration and see if it sends all the logs you want during shutdown. If so, you can then perhaps consider adding your DNS service (resolved or whatever DNS resolver service you use) to your app and alloy’s dependency chain.
We’re using grafana cloud. It seems like this is load balanced and it’s not happy if I specifically pass it via ip? Or is this incorrect?
We also added dns-resolver as a dependency and that didn’t help either. We did however see some dns resolution errors in alloy.
Oh, if you are using Grafana Cloud IP probably won’t work.
Try adding one of the static IP to the host file with the same DNS name, that should bypass DNS resolution. See if that fixes it. If that does then you probably have dependencies elsewhere that you need to address. I still think this is rather unreliable, and it might be a good idea to look for other potential solutions.
After some more investigation, it seems like NetworkManager already starts shutting down the interface before alloy is done sending it’s last batch.. Not sure how I can prevent that from happening.
In general trying to anticipate shutdown behavior is not ideal. But you could add NetworkManager to your dependency.