-
Grafana version: 9.5.6 and operating system: Alpine Linux v3.17 (docker container)
-
What are you trying to achieve?
Display AWS Opensearch 2xx response rate.
-
How are you trying to achieve it?
Http response codes for Opensearch cluster are available as count (Monitoring OpenSearch cluster metrics with Amazon CloudWatch - Amazon OpenSearch Service) and we’d like to express them as rates on a dashboard.
-
What happened?
After setting it up in Grafana, the metric graphed on the dashboard seems correct or as same as on the AWS Cloudwatch’s when the time range is 2, 7, 14 days but it is different when it is 24 hrs. When it is 24 hrs, the value returns by the PERIOD function is 5 times smaller; but it is correct on the Cloudwatch dashboard. The time interval for 24hr, 2 days is 5 mins while it is 15 min for 7 and 14 days.
-
What did you expect to happen?
Expect it to be correct and the same as it is displayed on the Cloudwatch dashboard. Or, I’d like to learn why.
-
Can you copy/paste the configuration(s) that you are having problems with?
The expression and the graph in Cloudwatch:
The dashboard in Grafana:
The code for the Grafana dashboard:
.addTargets([
grafana.cloudwatch.target(
datasource=cwDatasource,
region=region,
namespace='AWS/ES',
metric='2xx',
dimensions={ DomainName: '$domain', ClientId: clientid },
statistic='Sum',
id='id2',
alias='2xx'
),
grafana.cloudwatch.target(
region=region,
namespace='AWS/ES',
metric='2xx',
expression='id2/(PERIOD(id2))',
alias='2xx/period(2xx)'
),
])
-
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
No
-
Did you follow any online instructions? If so, what is the URL?
Thanks!!!
2 Likes
See timestamps: 0:15 vs 0:17. I guess both of them are using different aggregation interval. Blind guess: CW 5min, Grafana 2min.
You may try to play with Min interval
on the Grafana panel level or Interval
on the Grafana query level. But is it worth it?
1 Like
Based on my observation, whenever the time range is less than 48 hrs, the PERIOD will return 60 seconds vs. 300 seconds – that is how there is a 5 times difference. The graph doesn’t reflect the PERIOD change and it still shows that it is 5 minutes between any 2 data points. If my observation is right, should I create a new issue in their repo?
If it is true, then it is CloudWatch issue, not Grafana issue, because PERIOD
is CloudWatch metric math function, not Grafana function. You can prove it by graphing PERIOD result.
Use network console and compare queries which are issued In AWS Console and in Grafana and compare them. I would still bet on the aggregation period problem.
Is there a way to log the Period
value Grafana sends to CloudWatch? Thx.
Don’t log. Graph it, e.g. TIME_SERIES(PERIOD(m1))
And play with variables, e.g. dashboard time range, Period and prove that CloudWatch PERIOD
function is wrong:
whenever the time range is less than 48 hrs, the PERIOD will return 60 seconds vs. 300 seconds
Thx! I’ve made a .mov file to show it; unfortunately, the forum doesn’t allow uploading the file type. Please see
and
They are in 24 hrs and in sequence, and they show that the time interval on the graph is 5 mins while the value returned by TIME_SERIES(PERIOD(...))
function (with the alias PERIOD
listed at the bottom) is 60 seconds. It also shows that the value of 5xx
is 1.73. Then, if you look at
you will see that both the time interval on the graph and the the
PERIOD
value is 5 mins/300 s, and
5xx
is 0.347, which is 5x smaller.
I’m leaving the PERIOD
alias there because it makes sense if people know what the actually period is.
You may try to set Period
(this is not CloudWatch PERIOD
) on each Grafana CloudWatch query level. Unfortunately it is only a min period (max is is still “magic”).
As you see it is complicated, because each frontend has own “magic” to calculated aggregation periods under the hood. I would say it is not worth it to trying to replicate exactly the same view. And also you can hardode it, e. g. 1m because 1m aggregation is available only for last 15days for CloudWatch metrics.
1 Like
I realise this an old discussion, but I recently spend many hours trying to get useful visualisation out of CloudWatch Metrics myself.
For many metrics, CWM will return the count within the queried period. Since your period will depend on the selected time range, you suddenly get different values just by zooming in or out, which is confusing and useless.
To normalise the data, I used the “Builder” to select my metrics and then inspecting the query. Then copied the query and replaced the period (e.g. “60”) with the $__period_auto
period macro, and dividing the metric with the macro too. It seems leaving out the period from the query (but still dividing with it), also gives me the same result and a slightly simpler query.
So this: REMOVE_EMPTY(SEARCH('{"AWS/Lambda","FunctionName"} MetricName="Invocations"', 'Sum', 60))
Becomes this: REMOVE_EMPTY(SEARCH('{"AWS/Lambda","FunctionName"} MetricName="Invocations"', 'Sum')) / $__period_auto
This gives me values of “count” per second, regardless of selected time range, and I can use the appropriate unit on the visualisation panel.