The issue I’m trying to resolve: Grafana server is accessible for hundreds of staff. Sometimes these staff create large dashboards (20+ panels) with complicated queries and set the auto-refresh rate to 10 seconds.
Shortly afterwards either Prometheus/Graphite becomes overloaded and dies.
Looking for a solution to this.
Possible solutions
-
Use nginx rate-limiting on both Prometheus/Graphite and limit requests per IP. Currently, all Grafana requests share the same IP (Grafana) and ideally we could limit per user. Is such a thing possible?
-
Globally enforce the minimum auto-refresh rate. For example, no dashboard could exist with less than 1 minute auto-refresh rate.
-
Anything else?
Any advice is much appreciated!