I’m working on building a fairly complex data visualization representing the performance of our game servers and (in development) clients as well.
For context, we have code in our servers and client which keeps a ring-buffer of profiler data while running. If the performance gets “bad” we dump that buffer to disk. This file is something you can open in a profiler and see frame timings and a flame graph. I’d like to upload these files, or at least some of the data in these files, to a data source we can then visualize in Grafana.
Pyroscope looks awesome, but might be a bit more detailed than I’m really looking for. I don’t necessarily need full profiler traces in Grafana (though that would be cool). But just being able to see “bad performance” events with tags that identify some information about the source so I can correlate with Loki logs and other Prometheus metrics would be really awesome. Ideally we’d be able to link to a download of the source trace file from these events as well.
What I’m doing right now is submitting frame-time data to Prometheus and visualizing that in a line graph. I aggregate min/max/avg frame times of various threads into one second slices, and submit that data to Prometheus every 60 seconds or so. This is OK, but I’m looking to improve this with more detail. Particularly being able to download a trace file that is linked to a “high” point on the graph.
Uploading the files to S3, and linking to them from the graph is something I considered. There are challenges though where I’d only want a Grafana authenticated user to download the files, so they can’t just be public on S3.
Does anyone have any recommendations for valuable visualizations here, or how to organize the data for frame-time performance data when monitoring our applications?