Drilling into AWS CloudWatch ALB response time data

Hi there,

My goal is to identify requests that are taking time and lower them. Right now I’m viewing the 5 minute sum of TargetResponseTime in CloudWatch, though it doesn’t quite look the same in Grafana. Why is that?

What is Grafana’s version of CloudWatch’s period?

Here’s a video that hopefully better demonstrates my confusion:

Any tips please how I can best drill into this data?

Many thanks!

You can’t view individual requests in the CloudWatch, only aggregated metrics => Grafana is not the right tool for your goal.

You should enable access logging and then analyze access logs:

  • parse access logs from S3 into Elasticsearch and then you can query/visualize individial requests in Grafana - find some ready AWS Lambda code, which parse access logs into ES
  • use AWS native approach - Athena
  • use other log analytic tools, there is a plenty of them

AWS Athena is good and quick solution for one-time job. If you need to create also other dashboards (histogram per response code, backend, user agent, …) and you want them analyze in long term, then you need some storage (Elasticsearch, …) + visualization tool (we have to say Grafana here, of course :slight_smile: ).

1 Like

Thank you @jangaraj!

I did look into Athena, though I think a low tech version of zcat *.log.gz | awk '{print $7, $13, $14, $15}' | sort suited me better to find the request paths that were slow from the logs.

Nonetheless I am still a little confused in Grafana how to specify the period

So I replace Grafana’s Min period input’s auto with 60 for a 1 minute period??

Yes, just bear in mind retention period:

Data points with a period of 60 seconds (1 minute) are available for 15 days

Sorry, I guess my real question is what is auto in Grafana parlance? One hour? One minute?

=> it considers used timespan, namespace and AWS retention policy. It doesn’t make sense to have period 60 seconds, when you are displaying last month. So, implementation is smart and it tries to display the finest metric granularity.

Why are the plot lines disjoint? Please seek to 1:19 of https://s.natalian.org/2018-05-18/autofill.mp4

Earlier part of the video is off topic and attributed to https://github.com/grafana/grafana/issues/11984

Thanks again!

Because you have sparse data values. If you request count metric values are:

00:00 1
01:00 100

Does it mean, that you have had ~50 requests at 00:30?
No, metric value at 00:30 is NA (null, nil, …). But almost all monitoring graphs show that (CloudWatch console as well). They just connect 2 values and it is OK for many use cases. You can enable/disable this behavior in Grafana (Display -> Null value: connected).

I prefer bar graph for sparse values, for example for Lambda function stats:

1 Like