We have a central data centre (an operations centre) where we run our grafana stack and several geographically separated satellite data centres that are generating logs and metrics. These satellite data centres are connected to the operations centre using site to site VPNs. We can’t absolutely guarantee that the VPN links to the satellite DCs will be up at any given moment although they will be up most of the time.
The software in the satellite DCs run in Kubernetes pods that have a short life (10 seconds to 1 hour). I’m trying to find a way to prevent loss of logs if the link to the operation centre goes down. Ideally the logs in each of the DCs should be cached outside of their respective pods so that those DCs can continue to operate even if the link to the OC is lost and the short lived Kubernetes pods should continue to be short lived and not depend on the link to the OC before self terminating.
It looks to me like there is no OOTB way for promtail to send all logs from the short lived pods to a long lived log-caching pod in the DC that then forwards the logs from all pods to the operations centre ingester. Is there a recommended architecture for this, or can someone suggestion an option?