Dealing with multiple instances and TTL

Hello,

I have .NET desktop applications (WPF) which are sending metrics to Prometheus-Pushgateway, scrape them with Prometheus, and then displaying the data on Grafana.

The issue is that each application can have several instances running, and metrics should be sent separately for each one. An instance consists of the first 5 characters generated by a GUID, which creates a vast number of possibilities because sending “job=appName, instance=instanceId” to Pushgateway will practically create an infinite number of groups.

Each instance sends its activity status as a Gauge (0 or 1). Is there a better way to group or any other solutions that have built-in TTL (Time to Live) for metrics and would automatically clean up instances that are no longer sending their status (have a value of 0)?
Another situation is a power outage/blue screen that closes my application in a way that it doesn’t have time to report its activity status to 0 - in such cases, Pushgateway keeps the value at 1 and that group never vanish.

Is this the right way to do such thing or is there any pattern that I should use?

Thanks in advance!

I would use OpenTelemetry on the app side, which will be pushing metrics in OTLP format into Prometheus (newer Prometheus has support for OTLP). Missing timeseries will disappear from Prometheus after moment. Use instance label and then just aggregate metrics across all instances (if it makes sense) in the queries.You may also report uptime, so you can visualise life time of those instances, e.g.:

1 Like