How to count the number of events in the selected time range

Hello together

Hadoop/Hbase emits the number of log entries w.r.t the log levels. I pull them via the Prometheus JMX Exporter and rewrite the metrics so that the log level becomes a label:

hadoop_log_total{level="Error",service="NodeManager",} 0.0
hadoop_log_total{level="Warn",service="NodeManager",} 2.0
hadoop_log_total{level="Info",service="NodeManager",} 72.0

Now I want to display in Grafana the number of info, warnings and errors per Hadoop node/process that occurred in the selected period.

It seems to me that I have to count the events in Grafana rather than Prometheus since I’m not interesseted in a rate but in the total difference between the last and first counter value in the selected time frame. For this I’m using the following PROMQL expression in the Grafana Panel Query field:

sum(hadoop_log_total) by (servicel, level)

I’m then using the transformation “Reduce fields with calculation range” plus the transformation “Labels to fields with Value field label ‘level’” and display this within a Table.

This generally works, but I ran into two problems:

  1. Does the “reduce range” Transformation deal with Counter resets?
  2. When the selected timerange covers the startup phase of a process, the first N log level samples are null before the counter delivers the count of the first log entries (see screenshot). In this case Grafana’s range transformation ignores the null values so that a wrong range value is computed. Is there a way to tell Grafana to treat “nulls” as 0?

Best regards and thank you in advance
Udo

I will replay to me own post - maybe this will help someone.

Regarding

  • Does the “reduce range” Transformation deal with Counter resets?
    The answer is (of course) NO - it won’t.

I managed to change the queries and use the PROMQL increase() function but I’m not 100% lucky with this approach because increase() extrapolates values so that one can get non-integer values.

Nevertheless I’m still interesseted into the second question whether it is possible to turn nulls into 0s?

+1 (me too!)

I want to retrieve do exactly the same thing you’re doing.

count(redfish_temperature_reading_celsius{health="Critical"})

When this returns no data, I want to display that there are 0 critical errors in the panel (graph on the dashboard).