We have spark clusters with 100-200 nodes and we plot several metrics of executors, driver
We are not sure what’s the best way to create a dashboard at such scale? Visualizing all the 100-200 nodes and executor stats doesn’t surface the problem as there is lot of noise. It also slows down the dashboard tremendously
What are some good practices around grafana dashboards?
- Visualize using top K
- Plot only anomalies? How do we detect anomalies?
- How to reduce noise?
- How to make the dashboard more performant?
We use prometheus in the backend
welcome to the forum, @fhalde
one way to make dashboards more performant is to use the
-- Dashboard -- special datasource:
This allows you to use the results from another panel in this new panel. This can help avoid duplicate queries when, say, you are visualizing the same data two different ways.
This one is amazing! Beautiful tip !! thank you
Apart from this, do you think the general recommendation would be to reduce max datapoints? We look at dashboards with 24h+ intervals sometimes. IDK how to best set the value that gets managed dynamically whenever the timeline changes and which does not hide spikes from graphs
One way to always see spike would be to use max everywhere