Hello together
Hadoop/Hbase emits the number of log entries w.r.t the log levels. I pull them via the Prometheus JMX Exporter and rewrite the metrics so that the log level becomes a label:
hadoop_log_total{level="Error",service="NodeManager",} 0.0
hadoop_log_total{level="Warn",service="NodeManager",} 2.0
hadoop_log_total{level="Info",service="NodeManager",} 72.0
Now I want to display in Grafana the number of info, warnings and errors per Hadoop node/process that occurred in the selected period.
It seems to me that I have to count the events in Grafana rather than Prometheus since I’m not interesseted in a rate but in the total difference between the last and first counter value in the selected time frame. For this I’m using the following PROMQL expression in the Grafana Panel Query field:
sum(hadoop_log_total) by (servicel, level)
I’m then using the transformation “Reduce fields with calculation range” plus the transformation “Labels to fields with Value field label ‘level’” and display this within a Table.
This generally works, but I ran into two problems:
- Does the “reduce range” Transformation deal with Counter resets?
- When the selected timerange covers the startup phase of a process, the first N log level samples are null before the counter delivers the count of the first log entries (see screenshot). In this case Grafana’s range transformation ignores the null values so that a wrong range value is computed. Is there a way to tell Grafana to treat “nulls” as 0?
Best regards and thank you in advance
Udo