Solution advice for visualizing performance data

I’m working on building a fairly complex data visualization representing the performance of our game servers and (in development) clients as well.

For context, we have code in our servers and client which keeps a ring-buffer of profiler data while running. If the performance gets “bad” we dump that buffer to disk. This file is something you can open in a profiler and see frame timings and a flame graph. I’d like to upload these files, or at least some of the data in these files, to a data source we can then visualize in Grafana.

Pyroscope looks awesome, but might be a bit more detailed than I’m really looking for. I don’t necessarily need full profiler traces in Grafana (though that would be cool). But just being able to see “bad performance” events with tags that identify some information about the source so I can correlate with Loki logs and other Prometheus metrics would be really awesome. Ideally we’d be able to link to a download of the source trace file from these events as well.

What I’m doing right now is submitting frame-time data to Prometheus and visualizing that in a line graph. I aggregate min/max/avg frame times of various threads into one second slices, and submit that data to Prometheus every 60 seconds or so. This is OK, but I’m looking to improve this with more detail. Particularly being able to download a trace file that is linked to a “high” point on the graph.

Uploading the files to S3, and linking to them from the graph is something I considered. There are challenges though where I’d only want a Grafana authenticated user to download the files, so they can’t just be public on S3.

Does anyone have any recommendations for valuable visualizations here, or how to organize the data for frame-time performance data when monitoring our applications?

Your problem is seems like the CPU or memory performance.
If you have script like ps awx, then you can have result “bad performance”.
If you have the script for it, you can use this link to send to Prometheus

Just modify the script then you have the result in Prometheus

Regards,
Fadjar

Sorry if I was unclear in my original post. I’m not interested in monitoring the performance of the system the application runs on. Instead, we have internal metrics our application generates such as timings for game simulation frames across multiple threads. I want to store that information and link the “bad” frames to captured profiling traces in the visualization.

For example, a line graph showing frame timings and linking to a Pyroscope page for spikes that shows what was happening in a flame graph on that spike. Or a link to a download of a specialized profiling trace file on a spike.