Grafana + Prometheus + SNMP_export + large amount of data

Hello everyone!

I represent an ISP from Riga, Latvia.
I have recently installed Prometheus with Grafana and SNMP export. I am very surprised at how fast it queries and shows me data. But now I am wondering if this fits our network.

We have thousands of switches that I want to query for interface statistics, CPU, memory and maybe temperature. I think it is going to be up to 100k of time series.

I image it to work this way:

  1. I get info about a new switch that had been installed. For example, querying external DB. At this point I know IP and model.
  2. Then I put IP address in an appropriate device_list.yml file that is bind to a job of prometheus.yml. I can to it manually now and it is working.
  3. Grafana/Prometheus sees a new device and automatically links it to a graph template.

Does this sound real? Any suggestions on how to accomplish this?

What if I generate hundreds or even thousands of dashboards in Grafana? E.g. dashboard per switch where are interface statistics graphs. Up to 30 graphs per dashboard.

P.S.
I am struggling now with drawing delta on my graph. Any suggestions on how to generate delta from two IfInOctets and create a Mbps graph in Grafana?

Thank you very much beforehand!

Hi,

Lots of interesting questions. I’ll try to answer them.

100k is not much I think. Regarding Prometheus performance it’s a matter of memory (ram) for ingestion and queries, disk size and your retention policy for how long the data are stored before it automatically is removed. To verify your setup, hardware requirements and configuration you’ll need to test, test and test :slight_smile:

  1. and 2) should be fine. Prometheus support file-based service discovery which basically watches a file for changes.

Regarding 3), this is something not supported out of the box. It sounds that you want a dashboard per switch and in your case you need to find a way to automatically create a dashboard in Grafana based on your new switch added. However, are you aware of templating in Grafana? With that you can create one dashboard that automatically will support your new switches being added. If you have more labels for your devices you can create a template variable for each of these, like instance and model. So in this case you’ll get a dashboard where you can look at one switch at a time. You can then build a few more dashboards that show for example aggregated data of multiple or all switches - it’s all up to your imagination.

In terms of Grafana the number of dashboards shouldn’t be a problem. You may want to use postgres as database with Grafana instead of the default embedded sqlite3, but sqllite3. For 30 graphs per dashboard hitting Prometheus you may get inte some performance issues. As I wrote earlier you need ram (how much you need to test). There are also other ways of scaling prometheus in regards of ingestion and queries if you look around.

You can collapse rows of graphs in dashboards in Grafana. In this case you won’t query prometheus until you expand the rows. This can be a fair performance optimization in some cases.

Good luck

Marcus