How to get all metrics/accurate probe count from prometheus?

  • What Grafana version and what operating system are you using?
    9.4.3 on Ubuntu using the docker container

  • What are you trying to achieve?
    Use blackbox-exporter’s “probe success” to generate accurate info regarding downtime.

  • How are you trying to achieve it?
    I’ve got a table using this query: probe_success{subdomain=~"$subdomain",job=~"$blackbox_jobs"}
    followed by a whole bunch of transformations that take the count of failed probes and the probe interval to generate actual downtime figures.
    I also have the same query going into a state timeline.

  • What happened?
    When you increase the time range you end up with the downtime figures getting skewed. In the state timeline they increase, I assume reflecting the fact that we aren’t actually getting all the data back, just a summary that is showing “down” for a whole period when its only for part of the period.
    In the table they decrease, I assume because the count of probes being returned isn’t all of them so the maths of 5 failures times 15 seconds no longer works because those 5 failures actually represent 20 failed probes.

  • What did you expect to happen?
    What I’d like is to either get all the info from prometheus so the output stays accurate however wide the time window becomes OR have another way of getting accurate data in the table and state timeline. If its not possible for the state timeline I still desperately need it for the table.

Thanks anyone who can help!