I have an AWS ECS service with blue-green deployment. I have Prometheus data source with “up” metric and “instance” (ip address of the ECS taks) and “app” labels.
When we deploy a bug, the new task doesn’t start. That means it starts, works for few seconds (enough to send up=1 metric few times), dies and another one starts. I would like to use Grafana to be alarmed about this situation.
Now, I’m trying to create a query to count distinguish instances in last few minutes. The target situation is: “There are 4 instances running now but in last 15 minutes there were 20 distinct instances running. Raise the alarm!”
I can easily count running instances using a query count (up{app="my-app-name"})
but I have no idea how to count distinct instances in last 15 minutes.
Is there any way to do that?