Hi all,
Looking to get some insights from the Grafana community on anyone’s success stories on how they have deployed the LGTM stack across 2 or more on-premise data centres in a HA fashion.
We have replicated S3 capabilities at both sites with buckets that are multi-site write-able which we would love to utilise. Additionally we have a dedicated monitoring Kubernetes cluster available at each site (i.e. 2 clusters total) that should be able to host the entire LGTM stack. Currently I’m looking at some ideas on how to configure the S3 buckets (i.e. have a single bucket for both sites for each component) and how to approach configuring the datasources in Grafana (i.e. single Mimir / Loki data source encompassing both data centres or a data source per data centre etc.). Ideally we want the entire stack deployed at each site but to have each site’s metrics available on the primary Grafana instance which will be dictated by a replicated CloudNativePG instance with write access available at the primary Grafana site. In the event of a fail of the primary site we will swap to using the other site with a manual fail over to the replicated CloudNativePG instance acting as the new primary.
Can anyone see any issues with relying on a replicated bucket which has multi-site read/write enabled for ingesting metrics from two separate Mimir/Loki/Tempo deployments?
Interested to hear everyone’s thoughts. Perhaps I am complicating the whole setup and should take a simpler approach. I have read through much of the documentation but perhaps I have missed something that could have answered my question.
Apologies if this is the incorrect place to put this topic or I have forgotten to include any useful information. I’m happy to provide more details if required.