what is the recommended setup to do HA Prometheus + Grafana? With Grafana itself it sounds easy: sticky sessions, shared HA database. But what to do with the Prometheus datasource?
Promemetheus itself supports only HA by duplicating the installation.
Should I setup for each grafana a different datasource? - for Grafana A setup Prom A, for Grafana B setup Prom B. Would it even be possible with the shared SQL database?
Or should I put Prometheus API behind load balancer? - will this work, when the prometheus A and B will have different almost always slightly different data (because scrapes on A and B are not synchronized in any way)? The only possibility which occurs to me is not to use round robin on the Prometheus balacer, but use one of the servers as a Hot backupā¦
The only advice Iāve heard about HA for Prometheus is this:
In Grafana, you would just create two data sources, one for Prom A and one for Prom B. You could create a data source template variable to include on each dashboard allowing you to quickly switch between the two data sources. Here is an example with a data source template variable.
Hello Daniel, there are >50 dashboards connected to Prom A. How would you recommend to switch datasource from Prom - A to Prom B, in case of outage of Prom A or simply Maintenance of Prom A?
Thanks
Regards,
Oleg
If we are willing to live with slightly different data sets from time to time, does Grafana support sticky sessions? I.e. passing info about the user session to the data source?
Thanks for the info! I am thinking about a set up where multiple Prometheus instances are proxied behind nginx (or any load balancer), and configured as a single data source in Grafana. In this case, Iām wondering if Grafana can transfer information about the user session to the datasource configured for Proxy access, so that we could use sticky sessions at the Prometheus load balancer.
No, not automatically. If you proxy the request through the Grafana then the session cookies are stripped out.
Can you explain more about this:
I was thinking that you would have some rule in your nginx config that would connect a group of users to a Prometheus instance. E.g. all users with an IP address that ends with 1 go to Prometheus instance A.
How were you thinking of passing information to Prometheus?
I was just looking into this issue myself as well. The datasource template variable would work, but as others said, I didnāt want to have to go resetting the variable on all the dashboards. One thing we noticed, though, is that you can set which datasource you want as the default. And in the metrics themselves, you can set the datasource to be ādefaultā without having to define any other template variables. That seemed like it might be the solution until I realized that all the other template variables we are using do not have the option to use ādefaultā as the datasource and, again, require a datasource template variable.
If the template variables could also use our ādefaultā datasource, that would seem to solve this issue of high availability with prometheus when you just want to have a single default among several prometheus sources.
We are trying to go with a set up where both Grafana and Prometheus are proxied by nginx: the end user has a single url that is proxies 3 Grafana instances. Grafana is set up with a single data source, but that datasource has multiple prometheus instances proxied behind it.
What we hope to achieve is that requests from a single user session are distributed across Grafana instances, but are served from the same Prometheus DB. To achieve this, we were hoping that the proxy request from Grafana to the Prometheus datasource contains some information that identifies the original client; for example X-FORWARDED-FOR headers, or even better, session info. We could then use this identifying info in the 2nd nginx instance (the one proxying Prometheus) to implement sticky sessions based on the end user, rather than the Grafana instance that proxies the request.
Currently we dont forward headers/cookies to downstream systems. But we added support for this in Grafana 5.0 (currently in alpha, available as night builds).
In 5.0 you can whitelist headers that you want to keep for downstream systems.
A bit of a late response, but I have run into this same issue of unclear HA prometheus setup. To solve the issue I created promxy (https://github.com/jacksontj/promxy) which is an aggregating HTTP proxy for prometheus. This allows you to have a single datasource in grafana, and then let promxy do all the scatter/gather to the various downstream prometheus hosts for you. This gets you some additional benefits for HA and cross-shard aggregation, in addition do dropping the requirement for N datasources. I actually wrote a fairly detailed post (https://github.com/jacksontj/promxy/blob/master/MOTIVATION.md) about why I created it in the first place. I hope it can be of help to others also looking for HA prometheus