My goal is to identify requests that are taking time and lower them. Right now I’m viewing the 5 minute sum of TargetResponseTime in CloudWatch, though it doesn’t quite look the same in Grafana. Why is that?
You can’t view individual requests in the CloudWatch, only aggregated metrics => Grafana is not the right tool for your goal.
You should enable access logging and then analyze access logs:
parse access logs from S3 into Elasticsearch and then you can query/visualize individial requests in Grafana - find some ready AWS Lambda code, which parse access logs into ES
use other log analytic tools, there is a plenty of them
AWS Athena is good and quick solution for one-time job. If you need to create also other dashboards (histogram per response code, backend, user agent, …) and you want them analyze in long term, then you need some storage (Elasticsearch, …) + visualization tool (we have to say Grafana here, of course ).
I did look into Athena, though I think a low tech version of zcat *.log.gz | awk '{print $7, $13, $14, $15}' | sort suited me better to find the request paths that were slow from the logs.
Nonetheless I am still a little confused in Grafana how to specify the period
=> it considers used timespan, namespace and AWS retention policy. It doesn’t make sense to have period 60 seconds, when you are displaying last month. So, implementation is smart and it tries to display the finest metric granularity.
Because you have sparse data values. If you request count metric values are:
00:00 1
01:00 100
Does it mean, that you have had ~50 requests at 00:30?
No, metric value at 00:30 is NA (null, nil, …). But almost all monitoring graphs show that (CloudWatch console as well). They just connect 2 values and it is OK for many use cases. You can enable/disable this behavior in Grafana (Display → Null value: connected).
I prefer bar graph for sparse values, for example for Lambda function stats: