I have setup the TIG stack and configured Telegraf to collect metrics from multiple systems. Grafana is then setup with a dashboard from the community to present the data. However I have run into what seems to be a brick wall: the template variables not being supported by alerting.
When using the Telegraf system metrics dashboard, and some others built for use with Telegraf, they have a bunch of template variables to allow selection of the host tag. This makes it not possible to use the queries tied to the panels in alerting. At first to get around this I tested by making a 2nd hidden query for one of the panels that instead uses wildcards. This leads to another problem: when 1 or more hosts trigger the alert no new alerts are sent for subsequent hosts that trigger alert state while the other is still in it’s alert state. With 30+ servers this doesn’t seem practical.
With my current information the only idea I can think of is making a copy of each dashboard for each host with hard-coded values, and creating those alerts. And I am here to ask if there’s any other way to do it that’s better?