Disappointing First Experience with Grafana - can't do simple math?

I apologize for the clickbait-y title, but it does describe my problem.

I’m new to Grafana (but not to BI/DataViz tools) and I have to confess that my first experience with this tool has been disappointing.

My first task in a new job is to create a dashboard to show failures in a data ingestion pipeline. I have an AWS CloudWatch metric that indicates total number of messages and another metric that shows successfully ingested metrics. This should be easy-- I want to display

(total - success) / total * 100

This gives me the percentage of messages that failed ingestion.

Either I haven’t learned Grafana well enough, or Grafana makes it really hard to do this. I can use an Expression to do the calculation in one pass, but Expressions are still a beta feature… and Grafana doesn’t give me the ability to name this expression in the Panel, which is a major issue.

Alternatively, I can use a series of transforms, in which case I can rename the displayed values in the panel to display the value I want. This is a chore since I need to chain three expressions: one to get the delta between total and success, one to divide that by total, and a third to multiply by 100. Not an ideal solution, but it works.

Except… both approaches are giving me the wrong answer!

Check out the screenshot:

Here, you can see values for ‘Total’ and ‘Success’ in the top row. In the second row, the Transformation calculation shows the delta between the two is 34, which represents the number of failures.

On my calculator, 34 divided by 13,328,944 is 2.55E-6. According to Grafana, this expression evaluates to 0.000131. This incorrect result happens both in the Expression and the Transformation.

I clearly can’t publish a dashboard that doesn’t do math correctly. Am I doing something wrong here, or is Grafana broken?

Here’s the panel JSON:

{
  "id": 2,
  "gridPos": {
    "h": 9,
    "w": 12,
    "x": 0,
    "y": 0
  },
  "type": "stat",
  "title": "Panel Title",
  "transformations": [
    {
      "id": "calculateField",
      "options": {
        "mode": "binary",
        "reduce": {
          "reducer": "sum"
        },
        "alias": "FailedCount",
        "binary": {
          "left": "Total",
          "operator": "-",
          "reducer": "sum",
          "right": "Success"
        }
      }
    },
    {
      "id": "calculateField",
      "options": {
        "mode": "binary",
        "reduce": {
          "reducer": "sum"
        },
        "alias": "FailedRatio",
        "binary": {
          "left": "FailedCount",
          "operator": "/",
          "reducer": "sum",
          "right": "Total"
        }
      }
    },
    {
      "id": "calculateField",
      "options": {
        "alias": "FailedPercent",
        "binary": {
          "left": "FailedRatio",
          "operator": "*",
          "reducer": "sum",
          "right": "100"
        },
        "mode": "binary",
        "reduce": {
          "reducer": "sum"
        }
      }
    }
  ],
  "pluginVersion": "8.3.2",
  "fieldConfig": {
    "defaults": {
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {
            "color": "green",
            "value": null
          },
          {
            "color": "red",
            "value": 80
          }
        ]
      },
      "mappings": [],
      "color": {
        "mode": "thresholds"
      }
    },
    "overrides": []
  },
  "options": {
    "reduceOptions": {
      "values": false,
      "calcs": [
        "sum"
      ],
      "fields": ""
    },
    "orientation": "auto",
    "textMode": "auto",
    "colorMode": "value",
    "graphMode": "area",
    "justifyMode": "auto"
  },
  "targets": [
    {
      "queryMode": "Metrics",
      "namespace": "AWS/IoT",
      "metricName": "TopicMatch",
      "expression": "",
      "dimensions": {
        "RuleName": "<redacted>"
      },
      "region": "default",
      "id": "",
      "alias": "Total",
      "statistic": "SampleCount",
      "period": "",
      "metricQueryType": 0,
      "metricEditorMode": 0,
      "sqlExpression": "",
      "datasource": {
        "type": "cloudwatch",
        "uid": "<redacted>"
      },
      "matchExact": false,
      "refId": "A"
    },
    {
      "queryMode": "Metrics",
      "namespace": "AWS/IoT",
      "metricName": "Success",
      "expression": "",
      "dimensions": {
        "RuleName": "<redacted>"
      },
      "region": "default",
      "id": "",
      "alias": "Success",
      "statistic": "SampleCount",
      "period": "",
      "metricQueryType": 0,
      "metricEditorMode": 0,
      "sqlExpression": "",
      "datasource": {
        "type": "cloudwatch",
        "uid": "<redacted>"
      },
      "hide": false,
      "matchExact": false,
      "refId": "B"
    },
    {
      "refId": "C",
      "type": "math",
      "datasource": {
        "type": "__expr__",
        "uid": "__expr__"
      },
      "hide": false,
      "expression": "($A - $B) / $A * 100"
    }
  ],
  "datasource": null
}

Any guidance would be appreciated

EDIT: This is on Grafana 8.3.2.

I would be very disappointed with implementation, which calculates data in my browser. Quite good for small datasets, but then any manager will select last 5 year data with 1 sec aggregation and he will be complaining that his browser/machine is unresponsive.

So it is better to calculate it in the datasource directly. Exactly for this use case Cloudwatch has metric math, which is supported also by Grafana CloudWatch datasource. Create queries for total, sucess with correct IDs, which will be used in aditional query to calculated desired (total - success) / total * 100` result. Final timeseries will be calculated by CloudWatch and not by Grafana (expressions or transformations). Grafana just visualizes the final result. Easy-peasy :+1:

I would say you have different mindset from BI world, but Grafana is not BI tool first. It is timeseries visualization tool, which is running in your browser (backend is not doing any heavy computation usually).

1 Like

Thank you for your reply!

  1. I was not aware that Grafana does calculations client-side. I agree what I’m doing is probably not best practice. Even so, I would expect calculations to be slow, perhaps, but not mathematically incorrect. That’s just inexcusable even if what I’m doing is not best practice.

  2. Thank you for the link to AWS CloudWatch Metric Math. I’ll have to dig into that. I wish there were more examples of the syntax available.

Example dashboard with metric math grafana-aws-cloudwatch-dashboards/aws-cloudwatch-usage-metrics at master · monitoringartist/grafana-aws-cloudwatch-dashboards · GitHub Use it as an example to see how is correct syntax and then it will be piece of cake.

I’m afraid I still don’t see the syntax examples I need. The forumula I want is:

(Total - Success) / Total * 100

I can get Total easily enough:

select sum(TopicMatch) from “AWS/IoT” where RuleName = ‘myRule’

However, I haven’t yet figured out how to build a query that performs mathematical operations on 2 or more CloudWatch Metrics. All of the following fail:

  • select sum(TopicMatch) - sum(Success) from “AWS/IoT” WHERE RuleName = ‘myRule’
  • select sum(TopicMatch - Success) from “AWS/IoT” WHERE RuleName = ‘myRule’
  • select sum(TopicMatch) from “AWS/IoT” WHERE RuleName = ‘myRule’ - sum(Success) from “AWS/IoT” WHERE RuleName = 'myRule

Actually, maybe I’m still doing things wrong. The query I wrote above that works still gives me a time series of data and I want a scalar value.

Sigh…

A panel with one query works:
select sum(TopicMatch) from “AWS/IoT” where RuleName = ‘myRule’

If I change the query, this works:
select sum(Success) from “AWS/IoT” where RuleName = ‘myRule’

When I put both queries on the same panel, I get no data back.The tooltip in the magenta triangle says, "metric request error: "ValidationError: Maximum number of queries (1) exceeded\n\tstatus code: 400, request id: "

I’m finding it very frustrating to work with Grafana. Things that, to me, should be simple and should work are not simple and don’t work.

Is there any way to take two numbers from CloudWatch Metrics and calculate the percentage of one from another in Grafana?

SLO calculation for ALB:

I believe that’s exactly what I did in my first post in this thread. Look in my .json at “refId”: “C”.

It’s giving me mathematically incorrect results.

No, your original C query is ($A - $B) / $A * 100 - that’s incorrect syntax. $A means in this case Grafana dashboard variable with name A, which is very likely not defined, so it isn’t interpolated. CloudWatch math will get ($A - $B) / $A * 100 for evaluation and $A means nothing for CloudWatch, because there there is no query with ID $A (only A, but $A != A). Please pay attention to details. I gave you dashboard example, I gave you printscreen and unfortunately you are still not able to see those little details :unamused:.

Jangaraj, thank you for your feedback.

The reason I was referencing my prior queries as $A and $B is because that’s what was recommended to me in this thread.

I will try these solutions and report back.

Dying to hear how this turned out.

Any Grafana rookie will have a problem with CloudWatch metric math. It is an advance topic.

Donnie, it turned out with me looking for a solution other than Grafana.

It’s also an incredibly poorly documented topic. It’s not hard to imagine why a ‘rookie’ would have problems with a topic that is ‘advanced’ and has virtually no documentation, is it?

Hello guys,
I also want to create Dashboards inside panel and do do the similar tasks where,
I get the data from Azure monitor as data source and I need help to create the graph on basis of
Failed requests/Total requests * 100 And then, I need to send alert on basis of this when the Failed requests raised to 1% of total requests

I tried using Transformation however, the task is also to send an alert on basis of the given query. It says that by using Transformation could not send the alerts

Can anybody help here?
Thanks

I am running into the same issue that @nevotx experienced. I am trying to do a basic calculation to find good counts percentage (e.g. good/(good+bad)), but I’m getting incorrect numbers. Created two queries to get two values, good and bad. Then performed math on the third “query”. When displayed, the calculation is incorrect even though good and bad are both showing correct values. I then moved to use transform instead, but same result. At this point, I am in somewhat aligned with the complaint from @nevotx.

I have this exact same issue trying to calculate SLO from an elastic search response. Total Failed / Total Requests in the transform tab gives a wrong result