Find specific series that disappear, including metric name and labels

  • What Grafana version and what operating system are you using?

Grafana Cloud. Mostly irrelevant, if we can solve with promql or adjusting our data model.

  • What are you trying to achieve?

given series like this,

one_metric{svcslug="service_a",required_for_service_health="true",labels="I_want"} 0
two_metric{svcslug="service_a",required_for_service_health="true",other_labels="I_want"} 1
one_metric{svcslug="service_b",required_for_service_health="true",labels="I_want"} 0
three_metric{svcslug="service_b",required_for_service_health="true",other_labels="I_want"} 0

With values akin to nagios (0=health, 1=warn, 2=critical)…

I can find any service with one or more unhealthy series for a dashboard or alert. Great! But… missing metrics?

Losing signal on any series matching {__name__=~".+",required_for_service_health="true"} is what I am looking for. What if one_metric{svcslug="service_b",required_for_service_health="true",labels="I_want"} disappears? How do I identify this (the metric name, and labels, are all helpful - I do not want to group by anything ideally)?

I’ve seen a number of answers using count, offset, absent, etc., but, none seem to work with this data model. I could certainly shift the data model, but I’m not sure

  • How are you trying to achieve it?

promql presumably. I’ve tried many variations on absent count with offset unless, etc, that you find when searching.

I could in theory change the data model, this is mostly a POC. Perhaps adding a label that has the svcslug and differentiators comprised of data from labels that I care about - I suspect I will have to go this route, but would really prefer to avoid it.

Apologies if this is something simple that I am just missing in my inexperience - thanks!

1 Like

Ahh, gotcha - so in that case, I would need to know the labels I care about ahead of time, which unfortunately is not the case.

These sort of differ (there’s no common set of labels that I can use, each check can have different labels), but I suppose we can add a primary-key-ish label that is comprised of all the necessary values, had just hoped to avoid that duplicate work/complexity.

Idea was basically to allow multiple check-based series to contribute to a services health, so we might have:

necessary_connection_status{host="REDACTED_server_running_check", required_for_service_health="true", svcslug="REDACTED_service_this_check_contributes_to", name="REDACTED_target_uri_or_hostname", connection="tcp-445"}

win_service_status{host="REDACTED_server_running_check", required_for_service_health="true", svcslug="REDACTED_service_this_check_contributes_to", name="REDACTED_some_service_name", status="Running", displayname="REDACTED_some_service_display_name"}

(two among many other series that might contribute to one specific service. if all are 0, I know the service is healthy, at least, according to these checks. If any are 1, we have the series name/labels to identify the likely issue. if the check itself stops reporting though, no data)

So something like metric_identifier="$svsslug::$metric_name::$other_values_from_labels_like_service_name" with different values per series (service series might include service name, host, connection status series might include host, name, port/protocol, etc.) would be something we group on, if a bit… unwieldy.

Thanks!

1 Like

I think idea is correct. Pls update us about your final result. IMHO Prometheus was not designated for this kind of tasks, so their solutions is tricky. You may hit some Prometheus limits.

EDIT: possibly spoke too soon, the commented out series showed up in the output, but, so did other series after a time. Reading up on the actual promql, I’m probably just holding it wrong at this point, given that my data generally has value 0. will edit once sorted.


Thanks for the help (including your other answers that already had me pointed in this direction)! Not a huge fan of having to add this label, but, it’s a static label per series, and we’re nowhere near hitting the max labels per series limit, so, all good.

So to get this working:

(1) Come up with rough guidelines around a new label, let’s call it a “metric_identifier”. e.g.

original:

necessary_connection_status{host="REDACTED_server_running_check", required_for_service_health="true", svcslug="REDACTED_service_this_check_contributes_to", name="REDACTED_target_uri_or_hostname", connection="tcp-445"}

So we add: metric_label="conn::REDACTED_server_running_check::REDACTED_target_uri_or_hostname::tcp-445" which covers the labels one would need to pinpoint that this is for connectivity, from the server running the check to the target on the specific protocol,port…

(2) Once that is in place, borrow the idea/snippet from that other answer:

group by (svcslug, metric_identifier) (
             {__name__=~".+",required_for_service_health="true"} offset 1h
             unless on(svcslug, metric_identifier)
             {__name__=~".+",required_for_service_health="true"}
)==1

And intentionally commenting out one scenario of the series to test this, I get a hit!

{metric_identifier="conn::REDACTED_server_running_check::REDACTED_target_uri_or_hostname::tcp-445", svcslug="REDACTED_service_this_check_contributes_to"}

Thanks again - cheers!

1 Like