6.1.4: Error messages with alert handling / error="Could not find datasource database is locked"

I have started to get tons of error message related to alert handling:

> t=2019-04-25T06:22:12+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=119 name="Disk Space Alert () " error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:12+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=118 name="Disk Space Alert () " error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:12+0000 lvl=info msg="New state change" logger=alerting.resultHandler alertId=118 newState=alerting prev state=ok
> t=2019-04-25T06:22:12+0000 lvl=info msg="New state change" logger=alerting.resultHandler alertId=119 newState=alerting prev state=ok
> t=2019-04-25T06:22:12+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=120 name="Disk Space Alert () " error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:12+0000 lvl=info msg="New state change" logger=alerting.resultHandler alertId=120 newState=alerting prev state=ok
> t=2019-04-25T06:22:13+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=49 name="Disk Space Alert () " error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:13+0000 lvl=info msg="New state change" logger=alerting.resultHandler alertId=49 newState=alerting prev state=ok
> t=2019-04-25T06:22:13+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=47 name="Disk Space Alert () " error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:13+0000 lvl=info msg="New state change" logger=alerting.resultHandler alertId=47 newState=alerting prev state=ok
> t=2019-04-25T06:22:13+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=25 name="insitecluster-nodes Alert" error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:13+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=26 name="patientexplorer-nodes Alert" error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:13+0000 lvl=info msg="New state change" logger=alerting.resultHandler alertId=26 newState=alerting prev state=ok
> t=2019-04-25T06:22:13+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=24 name="multilabdl-nodes Alert" error="Could not find datasource database is locked" changing state to=alerting
> t=2019-04-25T06:22:13+0000 lvl=info msg="New state change" logger=alerting.resultHandler alertId=24 newState=alerting prev state=ok
> t=2019-04-25T06:22:13+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=48 name="Disk Space Alert () " error="Could not find datasource database is locked" changing state to=alerting

I am not absolutelyq sure, but I think that problem started when switching to use version 6 series Grafana.

Any ideas?

Looking at your logs - it looks like you are running out of disk space:

t=2019-04-25T06:22:13+0000 lvl=eror msg=“Alert Rule Result Error” logger=alerting.evalContext ruleId=48 name="Disk Space Alert () " error=“Could not find datasource database is locked” changing state to=alerting

If you have enough disk space then looks like it be related to your file system. (See here for an example of file system problems with Sqlite).

Yes systems which I am monitoring is running out of disk space, but not Grafana itself. It is getting these: “Could not find datasource database is locked” error messages. These message are in some cases visible also in Alert List visualization recent changes list. They are not visible in current status list at all.

What file system are you running on? Sqlite is a file-based database so it does not work on all types of file systems (see my previous reply for an example).

If you have a lot of traffic and are doing a lot of writes to the database then maybe you have reached the limit for sqlite and it is time to switch to MySql or Postgres. But this is unlikely unless you have a very large number of alerts or users.

Grafana is run on xfs and database is located on ext4.
How do you define lot of trafic?
Only two users and around 200 alerts so to me it doesn’t sound too much.

No, that is not a lot of traffic and xfs and ext4 are standard file systems. Looking at the error messages I’m not sure if they are sqlite errors. Which datasource is returning the errors?

Can you try turning on debug mode for logging your datasource and alerting: Configure Alerting | Grafana documentation

In this issue, the problem was that the datasource id had been changed:

A recent commit added a workaround which should make Grafana a little more tolerant of database contention: https://github.com/grafana/grafana/commit/5884e235fcf8cdbb4c42a94bdafe19881832bc54

I noticed this on a server with a lot of contention — it’s definitely gotten worse in the 6.x series — between Prometheus and Grafana for a particular storage partition so making sure it’s on a dedicated partition should help.

I have a Grafana 8.3.3 setup using default sqllite config.

I run about 150 alert rules.

I’m getting a lot of those errors in the alerts panel:

could not find datasource: database is locked

Looking at the logs, I also found a lot of

msg="failed to fetch alert rule" err="database is locked"

even

msg="failed to save alert state" err="database is locked"

The system is definitely not busy. It runs in a VM on a single ext4 partition with 35 Go free space, 7 Go RAM with 65% free, two cores, almost idle right now.

The alert rules are scheduled with a daily interval but I suspect they are all triggered at the same time, so that could be 150 threads trying to access the DB at the same time.

Could that be the cause?

Does this mean I’m already out of scale and I should move to another DB?

(Edit: just found Grafana Logs "database is locked" · Issue #16638 · grafana/grafana · GitHub. I’ll follow there.)