I am using Grafana Loki for storing logs from multiple microservices. Currently, I encounter huge amount of data logs in loki (approximately 40gb) in short period of time. I would like to know, which application stores so much data. What is the best way how to calculate amount of storage per data stream?
I am using three labels: service, stack and replica
At first, I tried to check metrics - there is no info about it. Second I tried the logcli - there is no option for this use case. I tried to analyze chunk directory, which could lead to some results.
I know, all data are stored in chunks and when I list them its just some list of base64 strings:
ZmFrZS80MThiYjBhMDIyYjc3NDNiOjE4MzU0NDg0NTMxOjE4MzU0YjY3NzEzOmM5ZWE4ODc2
ZmFrZS80MThiYjBhMDIyYjc3NDNiOjE4MzU0YjY5ZWExOjE4MzU1MjRkMzMwOmNkNWUwOTM5
ZmFrZS80MThiYjBhMDIyYjc3NDNiOjE4MzU1MjRmNGE0OjE4MzU1OTMzMWIxOjFiNTNjNmYw
ZmFrZS80MThiYjBhMDIyYjc3NDNiOjE4MzU1OTM1OTM5OjE4MzU2MDE2ZjljOmY1OGE2ZWIw
ZmFrZS80MThiYjBhMDIyYjc3NDNiOjE4MzU2MDE5NzJjOjE4MzU2NmZkMjNiOjIzZmIyMDEw
When I decode it with base64:
fake/418bb0a022b7743b:18354484531:18354b67713:c9ea8876
fake/418bb0a022b7743b:18354b69ea1:1835524d330:cd5e0939
fake/418bb0a022b7743b:1835524f4a4:183559331b1:1b53c6f0
fake/418bb0a022b7743b:18355935939:18356016f9c:f58a6eb0
fake/418bb0a022b7743b:1835601972c:183566fd23b:23fb2010
From the output I can determine only one think fake/ is default tenancy ID but do not know, what other numbers means.
I am using Promtail to harvest data from docker swarm services with docker_sd_configs.
I also thinking to use chunks-inspect but it is quiet brute force logic I would like not to try because it take huge amount of computation time and energy with increasing logs amount.
What is the best way to analyze how much data per stream is stored?