Count running instances over time

I have an AWS ECS service with blue-green deployment. I have Prometheus data source with “up” metric and “instance” (ip address of the ECS taks) and “app” labels.

When we deploy a bug, the new task doesn’t start. That means it starts, works for few seconds (enough to send up=1 metric few times), dies and another one starts. I would like to use Grafana to be alarmed about this situation.

Now, I’m trying to create a query to count distinguish instances in last few minutes. The target situation is: “There are 4 instances running now but in last 15 minutes there were 20 distinct instances running. Raise the alarm!”

I can easily count running instances using a query count (up{app="my-app-name"})
but I have no idea how to count distinct instances in last 15 minutes.

Is there any way to do that?

This topic was automatically closed after 365 days. New replies are no longer allowed.