Alerting based on docker container health

Hi, I want to be able to receive alerts when containers enter an unhealthy state.

I can pull this information using telegraf and influxdb, however from what I understand the alert rules will need a bit more finessing. Since it’s a table of data, I’ve found that you’re supposed to use expressions to reduce the data down to a single value, and then alert based on that. But I can’t get the expressions to pick up anything:

For example, I have this query that will return the status of a container (I guess in the amount of time it’s been healthy for?), but when trying to run an expression against it I get no data. I’ve tried the same when creating a visualisation and had the same result. What am I missing here?

That doesn’t look like a correct guess. That look like a timestamp, so there is no value for health.

It is always good idea to check doc of used tool. My guess:



docker_container_health (container must use the HEALTHCHECK)

    tags:
        engine_host
        server_version
        container_image
        container_name
        container_status
        container_version
    fields:
        health_status (string)
        failing_streak (integer)

So your container may not have HEALTHCHECK defined. If it has, then health_status is string and Grafana can’t graph/evaluate a string - so mentioned problems should be expected for string field.

Write query, which will return number, not string (blind guess, just example to give you idea how, very likely not valid syntax e. g. SELECT COUNT(health_status)... WHERE health_status == 'healthy'... No data is a problem in this case.
Be creative and test your query.

Thank you for the guidance! With this in mind, I’ve come to this query:

SELECT count("health_status") FROM "docker_container_health" WHERE ("health_status"::field != 'healthy' AND "container_name"::tag =~ /^gitea-runner*/) AND $timeFilter GROUP BY time($__interval), "host"::tag, "container_name"::tag

I opted to look for non-healthy containers instead of healthy ones. This is because when a container stopped being healthy, it would eventually fall out of the ‘healthy’ query. This seems to be satisfactory for my needs

1 Like