Service Crashes and Recovers

We are getting alerts from our Zabbix monitoring environment that the Grafana service is crashing on a server we manage. It is running some custom software from a provider, and one of the components is Grafana. We have reached out to the provider, but they don’t seem to knowledgably on the Grafana side, just wondering if anyone here sees anything in the logs that they can make sense of around the time of the crash I can use to point the provider in the right direction.

Crash occurred between 13:20 and 13:30 on 5/6. Let me know if there is any other information you would find helpful.

t=2024-05-06T00:03:03-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=36 newState=pending prev state=ok
t=2024-05-06T00:04:43-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T00:04:43-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T00:11:53-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=36 newState=ok prev state=pending
t=2024-05-06T01:04:43-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T01:04:43-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T02:04:43-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T02:04:43-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T03:04:43-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T03:04:43-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T04:04:43-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T04:04:43-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T05:04:43-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T05:04:43-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T06:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T06:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T07:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T07:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T07:04:44-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=0
t=2024-05-06T08:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T08:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T09:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T09:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T10:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T10:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T11:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T11:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T11:04:44-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=0
t=2024-05-06T12:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T12:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T13:04:44-0400 lvl=info msg="Validated license token" logger=licensing appURL=http : // localhost:4080/Grafana/ source=disk status=NotFound
t=2024-05-06T13:04:44-0400 lvl=warn msg="failed to load or validate token" logger=licensing err="license token file not found: C:\\Program Files\\GrafanaLabs\\grafana\\data\\license.jwt"
t=2024-05-06T13:28:19-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=2 name="CPU Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:19-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=2 newState=pending prev state=ok
t=2024-05-06T13:28:21-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=12 name="Standalone Cpu(s) Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:21-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=12 newState=pending prev state=ok
t=2024-05-06T13:28:21-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=8 name="Xcurserver Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:21-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=8 newState=pending prev state=ok
t=2024-05-06T13:28:21-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=0
t=2024-05-06T13:28:22-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=34 name="ctree LFCS Files Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:22-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=34 newState=pending prev state=ok
t=2024-05-06T13:28:24-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=10 name="Tomcat Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:24-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=10 newState=pending prev state=ok
t=2024-05-06T13:28:27-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=27 name="Standalone Memory(s) Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:27-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=27 newState=pending prev state=ok
t=2024-05-06T13:28:30-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=24 name="Ctreesql Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:30-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=24 newState=pending prev state=ok
t=2024-05-06T13:28:30-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=14 name="Interfaces Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:30-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=21 name="Memory Usage Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:30-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=21 newState=pending prev state=ok
t=2024-05-06T13:28:30-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=35 name="Ctree Dirs Space Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:30-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=35 newState=pending prev state=ok
t=2024-05-06T13:28:30-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=0
t=2024-05-06T13:28:33-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=20 name="Wait IO CPU Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:33-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=20 newState=pending prev state=ok
t=2024-05-06T13:28:38-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=23 name="Ctree Licenses Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:38-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=23 newState=pending prev state=ok
t=2024-05-06T13:28:41-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=25 name="Xcurserver Memory Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:41-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=25 newState=pending prev state=ok
t=2024-05-06T13:28:44-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=36 name="Disk Queue Length Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:44-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=36 newState=pending prev state=ok
t=2024-05-06T13:28:46-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=26 name="Tomcat Memory Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:46-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=26 newState=pending prev state=ok
t=2024-05-06T13:28:49-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=28 name="Interfaces Memory Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:28:49-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=28 newState=pending prev state=ok
t=2024-05-06T13:28:50-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=29 name="Response Time Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:19-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=2 name="CPU Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:20-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=26 name="Tomcat Memory Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:21-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=8 name="Xcurserver Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:24-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=10 name="Tomcat Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:24-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=12 name="Standalone Cpu(s) Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:24-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=21 name="Memory Usage Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:26-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=24 name="Ctreesql Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:28-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=23 name="Ctree Licenses Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:30-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=14 name="Interfaces Cpu Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:30-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=28 name="Interfaces Memory Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:33-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=20 name="Wait IO CPU Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:34-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=29 name="Response Time Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:29:37-0400 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=36 name="Disk Queue Length Alert" error="tsdb.HandleRequest() error rpc error: code = Unknown desc = Post http : // localhost:4071/GrafanaRrdDataSource//query: dial tcp 127.0.0.1:4071: connectex: No connection could be made because the target machine actively refused it." changing state to=alerting
t=2024-05-06T13:30:20-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=2 newState=ok prev state=pending
t=2024-05-06T13:30:20-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=8 newState=ok prev state=pending
t=2024-05-06T13:30:20-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=25 newState=ok prev state=pending
t=2024-05-06T13:30:20-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=24 newState=ok prev state=pending
t=2024-05-06T13:30:20-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=0
t=2024-05-06T13:30:20-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=0
t=2024-05-06T13:30:20-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=1
t=2024-05-06T13:30:20-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=27 newState=ok prev state=pending
t=2024-05-06T13:30:20-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=2
t=2024-05-06T13:30:20-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=0
t=2024-05-06T13:30:20-0400 lvl=info msg="Database locked, sleeping then retrying" logger=sqlstore error="database is locked" retry=1
t=2024-05-06T13:30:20-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=10 newState=ok prev state=pending
t=2024-05-06T13:30:23-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=12 newState=ok prev state=pending
t=2024-05-06T13:30:24-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=26 newState=ok prev state=pending
t=2024-05-06T13:30:26-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=36 newState=ok prev state=pending
t=2024-05-06T13:30:29-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=20 newState=ok prev state=pending
t=2024-05-06T13:30:32-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=21 newState=ok prev state=pending
t=2024-05-06T13:30:39-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=23 newState=ok prev state=pending
t=2024-05-06T13:30:48-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=28 newState=ok prev state=pending
t=2024-05-06T13:32:14-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=34 newState=ok prev state=pending
t=2024-05-06T13:32:38-0400 lvl=info msg="New state change" logger=alerting.resultHandler ruleId=35 newState=ok prev state=pending

There is no Grafana crash. Some rrd service is not responding. But that’s not a Grafana, so this is not Grafana issue.