Counting Container Restarts and Alerting on Slack

Hello,

I’ve set up Grafana and Prometheus using Docker on an Ubuntu 22 server, while I have another server running my application using docker-compose.

On the application server, I’m running three containers with a Docker restart policy of “unless-stopped.” Additionally, there’s a container named “cadvisor-agent.” I encountered an issue a few days ago where one of my containers entered an “exited” state, but it didn’t stop as expected due to the Docker policy.

I’m seeking with a GraphQL query to count container restarts. Specifically, I want to trigger an alert on my Slack channel if a container restarts more than three times in a minute.

Please note that I’m not very familiar with GraphQL, but here are some queries I’ve attempted:

absent(container_last_seen{job="app1 cadvisor", instance="1.0.0.10:8088", name="app1-containerA-1", container_label_com_docker_compose_project="app1",container_label_com_docker_compose_project_config_files="/home/app1/docker-compose.yml",container_label_com_docker_compose_project_working_dir="/home/app1",container_label_com_docker_compose_service="containerA"} ==1)  >= 1
 
count by (instance, name) (count_over_time(container_last_seen{container_label_restartcount!="",job="app1 cadvisor", instance="1.0.0.10:8088", name="app1-containerA-1", container_label_com_docker_compose_project="app1",container_label_com_docker_compose_project_config_files="/home/app1/docker-compose.yml",container_label_com_docker_compose_project_working_dir="/home/Borger",container_label_com_docker_compose_service="containerA"}[1m]) - 1 >= 1

absent(nonexistent{job="app1"}) => {job="app1"}

count by (instance, name) (count_over_time(container_last_seen{name!="", container_label_restartcount!=""}[15m])) - 1 >= 5