I have grafana setup with >2000 dashboard, and have lots of alerting set in grafana with prometheus backed with autoscale-ready thanos as its datasource. For now the setup is still on standalone setup, so only 1 grafana do alerting and serving dashboard access.
I have plan to update this setup to HA setup with separate DB and multiple grafana server since it gets heavier when lots of people try to access it. I setup multiple grafana server and point it to same dashboard DB.
The moment I start the second grafana server, thanos-query is upscaling like crazy. My initial assumption of why this is happening is because it is stated in documentation about alerting that
Currently alerting supports a limited form of high availability. Since v4.2.0, alert notifications are deduped when running multiple servers. This means all alerts are executed on every server but alert notifications are only sent once per alert. Grafana does not support load distribution between servers.
Which means 2 grafana server are running same alerting logic thus make the load of thanos query doubled compared to previous one.
My question is, is it possible for HA setup to configure only one instance of grafana-server that do the alerting while still keeping the other instance handling the display of dashboard ?
Or maybe you guys have better suggestion to implement HA setup with lots of alert ?
Thank you so much !!!