Percent Up time graphs

pdn · April 1, 2023, 1:26am

Please refer to the attached picture.

So, in that picture, there are 3 labels (grayed out, but they are c, n and s). At each time point on the x axis (equating to Prometheus scraping intervals), each label has a metrics value of 1 (up or green) or 0 (down or red). In this example, there are no down metrics.

What I like to have: a graph/panel that has 2 metrics for each label (so 6 total, for all 3 labels) that keeps being updated, depending on the values on the graph I shared here. The metrics are: #1) Percent of up (green) in a month, #2) Percent of up (green) in a year.

So it’s like a table, 3 rows (the labels, c/n/s) and 3 columns (1st column for the 3 labels, 2nd column is for the metrics #1, 3rd column for the metrics #2).

How can I do that?

Appreciate any help/tip you can offer!

grafana (1)

grant2 · April 1, 2023, 11:32am

Which visualization type are you using? Status history?

pdn · April 1, 2023, 11:49am

Yes, status history. For the new percentage graphs, it doesn’t matter what graph type, as long as we can see the % up time, that would be great.

grant2 · April 1, 2023, 12:18pm

@pdn
This is just a quick mockup, but the only way I see this working is to have 6 stat panels and a status history graph with 3 rows. Not sure how to do the % uptime calculation with Prometheus, but I would guess someone has done it already.

pdn · April 1, 2023, 12:35pm

Thank you @grant2, that’s exactly what I want to have.

Anyone who has done it already, please share how you would do it with prometheus data. I already have data coming in from Prometheus, either 1 (up) or 0 (down), at the pre-defined scraping interval.

grant2 · April 1, 2023, 12:51pm

Random Google searches reveal a lot of hits:

https://www.reddit.com/r/PrometheusMonitoring/comments/ior5kd/how_to_get_the_uptime_of_a_service_and_to_present/

and ChatGPT:
To calculate the percentage of uptime using a Prometheus query, you can use the up metric, which is a built-in metric in Prometheus that represents the health of a target (instance). The up metric has a value of 1 when the target is up and healthy, and 0 when the target is down. You can use the avg_over_time() function to get the average uptime over a certain time range.

Here’s a sample query to calculate the percentage of uptime for all instances over the last 1 hour:
avg(avg_over_time(up[1h])) * 100

This query will give you the average uptime percentage for all instances being monitored by Prometheus during the last hour. If you want to calculate the percentage of uptime for a specific instance or a group of instances, you can use the instance label in the query:
avg(avg_over_time(up{instance="your_instance_name"}[1h])) * 100

Replace your_instance_name with the actual name or IP of the instance you want to calculate the uptime percentage for.

Similarly, if you want to calculate the percentage of uptime for a specific job or service, you can use the job label in the query:
avg(avg_over_time(up{job="your_job_name"}[1h])) * 100

Replace your_job_name with the actual name of the job or service you want to calculate the uptime percentage for.

pdn · April 1, 2023, 1:14pm

Wow, chatGPT too?

I did google before, but I thought it might be quicker to post in Grafana community, kind of cheating
Thank you again @grant2! I will look more into it. But anyone already has done before, would love to have it!

pdn · April 1, 2023, 2:55pm

I struggled to come up with the top 2x3 graph with % values.

Don’t know how to do it, already tried so different things and different graph types. Would be great to see an example, with detailed how-to’s. Much thanks in advance!

grant2 · April 1, 2023, 11:20pm

Those are 6 separate Stat panels.

pdn · April 1, 2023, 11:56pm

I think I got it, by manually creating each stat graph.

You had to manually enter the labels (“% uptime for C this month”, etc) for each panel right?

I was hoping for the graph to pick up the labels automatically. But doesn’t seem to be the case.

grant2 · April 2, 2023, 2:44am

You can create a variable (e.g. “C”, “N”, etc.) and in the panel title, use a $ to have it reflect accordingly.

pdn · April 2, 2023, 3:23am

Thank you so much for your help and tips @grant2! I’ve learned a ton today. Have a great weekend!

pdn · April 2, 2023, 2:50pm

@grant2, I have one more question. I want to exclude NaN data (null) from the avg() calculation. Per chatGPT, it gave me this. But I got an error below. I googled, but no good hint. Any idea?

bad_data: 1:19: parse error: binary expression must contain only scalar and instant vector types

From chatGPT:

av(avg_over_time(api_response_time{service="my-service"}[5m] unless api_response_time{service="my-service"} == NaN))*100

pdn · April 15, 2023, 2:28pm

It turns out that, the avg_over_time() function does exclude non-value data points. No need to use ‘unless’ whatever.

Just to provide an update.

pdn · April 15, 2023, 2:39pm

@grant2, can you help me with the enhancement below? What panel type and how the promQL looks like?

I’ve got my % uptime graphs up, working and looking great. I define a ‘interval’ variable, which user can select, to dynamically see the % uptime up values.

Now, as an enhancement to my graphs. I now want a small panel on top that provides a count of the ‘down’ value, for a specific compute interval (ie, 10d, 30d) selected by the user.

Thank you!

jsabat1 · January 22, 2024, 8:55am

is this available for Windows Exporter?

can i get All the previous up time every time we restart the server

grant2 · January 24, 2024, 10:01am

Welcome @jsabat1

If you are using this exporter, then yes, I would say that can collect (in a table or otherwise) all the timestamps when the server was reset. I do not use this exporter, but am sure you can ask further questions on the author’s github page.

Topic		Replies	Views
I need help with PromQL - not sure if this is even possible Prometheus	0	377	May 1, 2020
Dashboard chart to show server uptime and downtime history Dashboards panel , prometheus	6	11422	January 3, 2023
PromQL - How to calculate percentage from these 2 metrics for time series panel Time Series Panel promql	0	2878	October 18, 2022
SLA dashboard using grafana, prometheus and blackbox exporter	1	5327	June 16, 2020
How to calculate percentage of uptime using grafana and prometheus	1	4428	September 11, 2024

Percent Up time graphs

Related topics