Recommended HA setup

Hi,

what is the recommended setup to do HA Prometheus + Grafana? With Grafana itself it sounds easy: sticky sessions, shared HA database. But what to do with the Prometheus datasource?

Promemetheus itself supports only HA by duplicating the installation.

  1. Should I setup for each grafana a different datasource? - for Grafana A setup Prom A, for Grafana B setup Prom B. Would it even be possible with the shared SQL database?

  2. Or should I put Prometheus API behind load balancer? - will this work, when the prometheus A and B will have different almost always slightly different data (because scrapes on A and B are not synchronized in any way)? The only possibility which occurs to me is not to use round robin on the Prometheus balacer, but use one of the servers as a Hot backup…

  3. Is there any other way? :slight_smile:

Thanks!

The only advice I’ve heard about HA for Prometheus is this:

https://prometheus.io/docs/introduction/faq/#can-prometheus-be-made-highly-available

In Grafana, you would just create two data sources, one for Prom A and one for Prom B. You could create a data source template variable to include on each dashboard allowing you to quickly switch between the two data sources. Here is an example with a data source template variable.

P.S Here is the GitHub issue on HA for Prometheus.

Hello Daniel, there are >50 dashboards connected to Prom A. How would you recommend to switch datasource from Prom - A to Prom B, in case of outage of Prom A or simply Maintenance of Prom A?
Thanks
Regards,
Oleg

I can think of two ways:

  • Use a data source template variable in all your dashboards so users can switch from Prom A to Prom B if Prom A does not work.
  • With the Grafana API, change the data source url to Prometheus B.
1 Like

If we are willing to live with slightly different data sets from time to time, does Grafana support sticky sessions? I.e. passing info about the user session to the data source?

Thanks!
James

What do you mean by “does Grafana support sticky sessions?” It is possible to use sticky sessions with a load balancer instead of using the db for sessions. See: http://docs.grafana.org/tutorials/ha_setup/#user-sessions

Thanks for the info! I am thinking about a set up where multiple Prometheus instances are proxied behind nginx (or any load balancer), and configured as a single data source in Grafana. In this case, I’m wondering if Grafana can transfer information about the user session to the datasource configured for Proxy access, so that we could use sticky sessions at the Prometheus load balancer.

Cheers,
James

Grafana uses cookies for sessions so if sticky sessions are enabled in your load balancer then it should work. Does that answer your question?

Ok so Grafana passes user cookies to its data source?

No, not automatically. If you proxy the request through the Grafana then the session cookies are stripped out.

Can you explain more about this:

I was thinking that you would have some rule in your nginx config that would connect a group of users to a Prometheus instance. E.g. all users with an IP address that ends with 1 go to Prometheus instance A.

How were you thinking of passing information to Prometheus?

I was just looking into this issue myself as well. The datasource template variable would work, but as others said, I didn’t want to have to go resetting the variable on all the dashboards. One thing we noticed, though, is that you can set which datasource you want as the default. And in the metrics themselves, you can set the datasource to be “default” without having to define any other template variables. That seemed like it might be the solution until I realized that all the other template variables we are using do not have the option to use “default” as the datasource and, again, require a datasource template variable.

If the template variables could also use our “default” datasource, that would seem to solve this issue of high availability with prometheus when you just want to have a single default among several prometheus sources.

We are trying to go with a set up where both Grafana and Prometheus are proxied by nginx: the end user has a single url that is proxies 3 Grafana instances. Grafana is set up with a single data source, but that datasource has multiple prometheus instances proxied behind it.

What we hope to achieve is that requests from a single user session are distributed across Grafana instances, but are served from the same Prometheus DB. To achieve this, we were hoping that the proxy request from Grafana to the Prometheus datasource contains some information that identifies the original client; for example X-FORWARDED-FOR headers, or even better, session info. We could then use this identifying info in the 2nd nginx instance (the one proxying Prometheus) to implement sticky sessions based on the end user, rather than the Grafana instance that proxies the request.

Cheers,
James

Currently we dont forward headers/cookies to downstream systems. But we added support for this in Grafana 5.0 (currently in alpha, available as night builds).

In 5.0 you can whitelist headers that you want to keep for downstream systems.
image

Yes,Grafana can not set default variable value,and what is most important is:you can not use variables in Alert!

A bit of a late response, but I have run into this same issue of unclear HA prometheus setup. To solve the issue I created promxy (https://github.com/jacksontj/promxy) which is an aggregating HTTP proxy for prometheus. This allows you to have a single datasource in grafana, and then let promxy do all the scatter/gather to the various downstream prometheus hosts for you. This gets you some additional benefits for HA and cross-shard aggregation, in addition do dropping the requirement for N datasources. I actually wrote a fairly detailed post (https://github.com/jacksontj/promxy/blob/master/MOTIVATION.md) about why I created it in the first place. I hope it can be of help to others also looking for HA prometheus :slight_smile:

2 Likes