I have installed grafana-oss (version - Grafana v10.0.0) in open-shift as container. Following are the config files configured
- grafana.ini configured as ConfigMap
- datasources configured as ConfigMap
- grafana-data configured as PV (NFS)
- grafana-logs configured as container volume.
Also, I have configured alert rules, contact points, notification policies which was working as expected. After few days, contact points, notification policies are missing. This was happen to me almost every week. In between I am also getting database lock issue as well. I don’t know what is the issue.
Thanks
welcome @prempalsingh
could you safely share the content of your container configuration file? Specifically how you are persisting the database
in addition, what database are you using for Grafana? Could you call this API endpoint and share the details under
"database":{
and, lastly, please increase the verbosity of the Grafana server logs to debug and note any errors.
Please find below details…
“database”:{“ca_cert_path”:“”,“cache_mode”:“private”,“client_cert_path”:“”,“client_key_path”:“”,“conn_max_lifetime”:“14400”,“connection_string”:“”,“host”:“127.0.0.1:3306”,“instrument_queries”:“false”,“isolation_level”:“”,“locking_attempt_timeout_sec”:“0”,“log_queries”:“false”,“max_idle_conn”:“2”,“max_open_conn”:“0”,“name”:“grafana”,“password”:“”,“path”:“grafana.db”,“query_retries”:“0”,“server_cert_name”:“”,“skip_migrations”:“”,“ssl_mode”:“disable”,“transaction_retries”:“5”,“type”:“sqlite3”,“url”:“”,“user”:“root”,“wal”:“false”}
------------------------------------------------ pod logs…-----------------
logger=provisioning.plugins t=2023-06-21T07:57:42.249177637Z level=error msg=“Failed to read plugin provisioning files from directory” path=/etc/grafana/provisioning/plugins error=“open /etc/grafana/provisioning/plugins: no such file or directory”
logger=provisioning.notifiers t=2023-06-21T07:57:42.249196595Z level=debug msg=“Looking for alert notification provisioning files” path=/etc/grafana/provisioning/notifiers
logger=provisioning.notifiers t=2023-06-21T07:57:42.249204956Z level=error msg=“Can’t read alert notification provisioning files from directory” path=/etc/grafana/provisioning/notifiers error=“open /etc/grafana/provisioning/notifiers: no such file or directory”
logger=provisioning.alerting t=2023-06-21T07:57:42.249226017Z level=debug msg=“looking for alerting provisioning files” path=/etc/grafana/provisioning/alerting
logger=provisioning.alerting t=2023-06-21T07:57:42.249234024Z level=error msg=“can’t read alerting provisioning files from directory” path=/etc/grafana/provisioning/alerting error=“open /etc/grafana/provisioning/alerting: no such file or directory”
logger=ticker t=2023-06-21T07:57:58.639170784Z level=info msg=starting first_tick=2023-06-21T07:58:00Z
logger=sqlstore.transactions t=2023-06-21T07:58:03.64193404Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=0 code=“database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.641962438Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=0 code=“database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.655208043Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655256495Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.655390971Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655421845Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.65542464Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655445905Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.655529958Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655550174Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.6564804Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.656502482Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:04.660228054Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=infra.lockservice t=2023-06-21T07:58:04.660293648Z level=error msg=“Failed to release the lock” error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=infra.lockservice t=2023-06-21T07:58:04.66030356Z level=debug msg=“LockExecuteAndRelease finished” actionName=“secret migration task " duration=22.410032466s
logger=server t=2023-06-21T07:58:04.660312848Z level=debug msg=“Stopped background service” service=*migrations.SecretMigrationProviderImpl reason=null
logger=sqlstore.transactions t=2023-06-21T07:58:09.31421766Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:10.43366506Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context userId=0 orgId=1 uname= t=2023-06-21T07:58:10.433733437Z level=warn msg=“Failed to tag anonymous session” error=”[sqlstore.max-retries-reached] retry 1: database is locked"
logger=provisioning.dashboard t=2023-06-21T07:57:42.250873703Z level=error msg=“can’t read dashboard provisioning files from directory” path=/etc/grafana/provisioning/dashboards error=“open /etc/grafana/provisioning/dashboards: no such file or directory”
Following is Persistent volume file
kind: PersistentVolume
apiVersion: v1
metadata:
name: grafana-oss-pv
uid: bcb2b16c-1371-4ee8-af4c-2c8750c1166f
resourceVersion: ‘782820495’
creationTimestamp: ‘2023-06-06T12:23:06Z’
labels:
app: grafana
annotations:
pv.kubernetes.io/bound-by-controller: ‘yes’
finalizers:
- kubernetes.io/pv-protection
managedFields:
- manager: Mozilla
operation: Update
apiVersion: v1
time: ‘2023-06-06T12:23:06Z’
fieldsType: FieldsV1
fieldsV1:
‘f:metadata’:
‘f:labels’:
.: {}
‘f:app’: {}
‘f:spec’:
‘f:accessModes’: {}
‘f:capacity’:
.: {}
‘f:storage’: {}
‘f:nfs’:
.: {}
‘f:path’: {}
‘f:server’: {}
‘f:persistentVolumeReclaimPolicy’: {}
‘f:storageClassName’: {}
‘f:volumeMode’: {}
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: ‘2023-06-06T12:23:36Z’
fieldsType: FieldsV1
fieldsV1:
‘f:metadata’:
‘f:annotations’:
.: {}
‘f:pv.kubernetes.io/bound-by-controller’: {}
‘f:spec’:
‘f:claimRef’:
.: {}
‘f:apiVersion’: {}
‘f:kind’: {}
‘f:name’: {}
‘f:namespace’: {}
‘f:resourceVersion’: {}
‘f:uid’: {}
‘f:status’:
‘f:phase’: {}
spec:
capacity:
storage: 50Gi
nfs:
server:
path: /nfs-storage/grafana-oss-pv
accessModes:
- ReadWriteOnce
claimRef:
kind: PersistentVolumeClaim
namespace: monitoring
name: grafana-oss-pvc
uid: 4fcda229-6638-43e0-827e-57fb91a077dd
apiVersion: v1
resourceVersion: ‘782820475’
persistentVolumeReclaimPolicy: Retain
storageClassName: thin
volumeMode: Filesystem
status:
phase: Bound
Hi, I experience the same issue (Grafana v.9.4.7).
Often, between restarts, the ‘notification policies’ and the ‘contact points’ are lost.
Here is one interesting thing that my friend noticed: These two elemenets are the only two elements that we’ve seen so far that have no internal ‘id’. Every other element will have an ‘id’ field.
Could this have something to do with the loss of these elements after restarts?
For the reference, here is my contact-point (exported via API):
[
{
“uid”: “gAleuU3Vk”,
“name”: “My test e-mail”,
“type”: “email”,
“settings”: {
“addresses”: “redacted\n”,
“singleEmail”: false
},
“disableResolveMessage”: false
},
{
“uid”: “g59BXUqVk”,
“name”: “WebHook_1 Forwarder”,
“type”: “webhook”,
“settings”: {
“url”: “http://127.0.0.1:5001”
},
“disableResolveMessage”: false
},
{
“uid”: “fdrfu83Vk”,
“name”: “email receiver”,
“type”: “email”,
“settings”: {
“addresses”: “example@email.com”
},
“disableResolveMessage”: false
}
]
And here is my notification policy, also exported via API:
{
“receiver”: “grafana-default-email”,
“group_by”: [
“grafana_folder”,
“alertname”
],
“routes”: [
{
“receiver”: “WebHook_1 Forwarder”,
“group_by”: [
“…”
],
“object_matchers”: [
[
“notification”,
“=”,
“production”
]
]
}
]
}
Notice how neither CP nor NP have the “id” field? And in case of notification policies, they are also lacking the “uid” field too.