I’m going to start a project using prometheus, opentelemetry collector and Grafana in order to monitor an app in production. I’ve already used Grafana for other smalls projects, but for this one I have other kinds of requirements.
First, I’ve found it very difficult to version grafana dashboards, ensuring that developers modifying a dashboard export it, then put it to a gitlab repo. I didn’t find any post here or tutorial explaining what is the best practice in order to collaborate while saving grafana dashboards outside grafana. Some tools are about dashboard as code, but I think I don’t understand how these tools can help me build a dashboard while versioning it.
Then, I would like to set up some kind of automated tests. Many times, I had to put some metrics in my app code, executing the code to generate the corresponding prometheus metric and at the end, watching the correct behavior in the dashboard for every use case. That’s not very durable, if the dashboard is going to live, to be modified. Let’s take an example. I have a metric to measure the number of http requests made to the url “/foo”. I have a counter for 200 responses and a counter for other http status code. My dashboard has a counter to display the percentage of failed requests. How can I test that, if I generate a bad response for my url “/foo”, my dashboard will indeed display it like I want? I’ve looked around, but I didn’t find any tool provided by Grafana or the community. I think these kinds of tests are important to ensure that you can trust the dashboard you have crafted.
Any feedback from any kind of users will be appreciated, as these questions are keys foundations of a new project.