Since updating Grafana to version 11.x (currently 11.1.0) and Loki to 3.x (currently 3.1.0) I had some strange issues with using dashboard-variables. During the process of creating a new dashboard and adding some variable and implementing them in the query, the loki process is starting to use way more resources than usual and the process may even crash and restart. Grafana process only needs slightly more resource at the same time. I did not pay too much attention to it until recently it left me with a corrupt loki db. Unfortunately I did not find yet a stable way to reproduce this behavior, but will update as soon as I know more. It only happens during initial dashboard creation, once the dashboard is up and running, I never experienced any further troubles. So I assume a problem with at this time not yet set default value and therefore null value in the variable, but that’s just an assumption.
Or is there any sanitize option for Grafana when querying Loki that I am not aware of? Or are there any known changes in Grafana 11.x that could have an effect on input sanitizing and rolling back to 10.x would be better (at least for a productive environment)?
Tried to extract the info from loki syslog (btw, why does loki not have its own log location like /var/log/loki.log?).
Below is the loki log (slightly edited to not show user and domain) that was generated when editing a dashboard/variable from grafana-ui. After the first query error loki did not accept further queries and CPU and memory consumption went up to the point where oom-reaper started to kill loki pro-cess.
At this point a warning for using the option flush_on_shutdown: true, I had this option always set to false and when crashing loki process never had any troubles with database consistency. But when this crash happened I had this option set to true and it was the first and only time I had a corrupt table. Most likely somewhere during oom-pressure loki process starts to shutdown and tries to write down all log in memory and then finaly gets killed during the process of flushing to disc. If this is true I prefer to have wal enabled and just can safely kill loki process.