Hi, I am working with a very large dataset of around 20 million data points when I filter for the last 7 days/30 days. The data points are across timestamps and collected every second. And it’s hard to visualize all data points. What’s your thought on this? Perform some random sampling? FYI, the queries are in SQL.
Hi!
In general, Random Sampling sounds like a good starting point. I would be thoughtful on how you sample so that the data is representative. You could also aggregations.
1 Like
Aggregate, e.g. per period and use case. My 24-hour dataset has 200k datapoints for CPU usage. I don’t care about each datapoint. I visualise min/avg/max per 5 minutes:
That’s only 864 datapoints - there will only be 864 datapoints per 24 hours, even when raw data has billions of datapoints, thanks to aggregation. Then I can zoom in on any interesting peaks/drops/…
SQL example with avg aggregation (example, so it may not work for your data model):
SELECT
$__timeGroupAlias(timewithtz, $agg, 0),
avg(value) AS "avg"
FROM $table
WHERE
$__timeFilter(timewithtz)
GROUP BY 1
ORDER BY 1
1 Like
Thanks all. I resolve it by optimizing my query to avoid fetching redundant things, but I appreciate.