Grafana dashboard best practice for large scale monitoring

fhalde · April 7, 2022, 2:52pm

We have spark clusters with 100-200 nodes and we plot several metrics of executors, driver

We are not sure what’s the best way to create a dashboard at such scale? Visualizing all the 100-200 nodes and executor stats doesn’t surface the problem as there is lot of noise. It also slows down the dashboard tremendously

What are some good practices around grafana dashboards?

Visualize using top K
Plot only anomalies? How do we detect anomalies?
How to reduce noise?
How to make the dashboard more performant?

We use prometheus in the backend

mattabrams · April 14, 2022, 12:26am

welcome to the forum, @fhalde

one way to make dashboards more performant is to use the -- Dashboard -- special datasource:

This allows you to use the results from another panel in this new panel. This can help avoid duplicate queries when, say, you are visualizing the same data two different ways.

fhalde · April 14, 2022, 9:19am

This one is amazing! Beautiful tip !! thank you

Apart from this, do you think the general recommendation would be to reduce max datapoints? We look at dashboards with 24h+ intervals sometimes. IDK how to best set the value that gets managed dynamically whenever the timeline changes and which does not hide spikes from graphs

One way to always see spike would be to use max everywhere

paxistas · September 6, 2024, 2:41pm

yeah, but it is not possible to edit the original query, so it’s not that useful

Topic		Replies	Views
Optimization grafana Grafana	2	1163	October 27, 2017
Maximum number of Dashboards per Grafana install Configuration templating	4	3643	August 23, 2018
Grafana + Prometheus + SNMP_export + large amount of data Grafana	1	2780	November 21, 2017
Using prometheus with Grafana	0	302	March 12, 2019
Global status dashboard	2	1015	January 13, 2021

Grafana dashboard best practice for large scale monitoring

Related topics