Missing Contact point and Notification policies

prempalsingh · June 21, 2023, 9:52am

I have installed grafana-oss (version - Grafana v10.0.0) in open-shift as container. Following are the config files configured

grafana.ini configured as ConfigMap
datasources configured as ConfigMap
grafana-data configured as PV (NFS)
grafana-logs configured as container volume.

Also, I have configured alert rules, contact points, notification policies which was working as expected. After few days, contact points, notification policies are missing. This was happen to me almost every week. In between I am also getting database lock issue as well. I don’t know what is the issue.

Thanks

antonio · June 21, 2023, 12:47pm

welcome @prempalsingh

could you safely share the content of your container configuration file? Specifically how you are persisting the database

in addition, what database are you using for Grafana? Could you call this API endpoint and share the details under

"database":{

and, lastly, please increase the verbosity of the Grafana server logs to debug and note any errors.

prempalsingh · June 21, 2023, 2:08pm

Please find below details…
“database”:{“ca_cert_path”:“”,“cache_mode”:“private”,“client_cert_path”:“”,“client_key_path”:“”,“conn_max_lifetime”:“14400”,“connection_string”:“”,“host”:“127.0.0.1:3306”,“instrument_queries”:“false”,“isolation_level”:“”,“locking_attempt_timeout_sec”:“0”,“log_queries”:“false”,“max_idle_conn”:“2”,“max_open_conn”:“0”,“name”:“grafana”,“password”:“”,“path”:“grafana.db”,“query_retries”:“0”,“server_cert_name”:“”,“skip_migrations”:“”,“ssl_mode”:“disable”,“transaction_retries”:“5”,“type”:“sqlite3”,“url”:“”,“user”:“root”,“wal”:“false”}
------------------------------------------------ pod logs…-----------------
logger=provisioning.plugins t=2023-06-21T07:57:42.249177637Z level=error msg=“Failed to read plugin provisioning files from directory” path=/etc/grafana/provisioning/plugins error=“open /etc/grafana/provisioning/plugins: no such file or directory”
logger=provisioning.notifiers t=2023-06-21T07:57:42.249196595Z level=debug msg=“Looking for alert notification provisioning files” path=/etc/grafana/provisioning/notifiers
logger=provisioning.notifiers t=2023-06-21T07:57:42.249204956Z level=error msg=“Can’t read alert notification provisioning files from directory” path=/etc/grafana/provisioning/notifiers error=“open /etc/grafana/provisioning/notifiers: no such file or directory”
logger=provisioning.alerting t=2023-06-21T07:57:42.249226017Z level=debug msg=“looking for alerting provisioning files” path=/etc/grafana/provisioning/alerting
logger=provisioning.alerting t=2023-06-21T07:57:42.249234024Z level=error msg=“can’t read alerting provisioning files from directory” path=/etc/grafana/provisioning/alerting error=“open /etc/grafana/provisioning/alerting: no such file or directory”
logger=ticker t=2023-06-21T07:57:58.639170784Z level=info msg=starting first_tick=2023-06-21T07:58:00Z
logger=sqlstore.transactions t=2023-06-21T07:58:03.64193404Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=0 code=“database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.641962438Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=0 code=“database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.655208043Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655256495Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.655390971Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655421845Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.65542464Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655445905Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.655529958Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.655550174Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:03.6564804Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context t=2023-06-21T07:58:03.656502482Z level=error msg=“Failed to get user with id” userId=1 error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:04.660228054Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=infra.lockservice t=2023-06-21T07:58:04.660293648Z level=error msg=“Failed to release the lock” error=“[sqlstore.max-retries-reached] retry 1: database is locked”
logger=infra.lockservice t=2023-06-21T07:58:04.66030356Z level=debug msg=“LockExecuteAndRelease finished” actionName=“secret migration task " duration=22.410032466s
logger=server t=2023-06-21T07:58:04.660312848Z level=debug msg=“Stopped background service” service=*migrations.SecretMigrationProviderImpl reason=null
logger=sqlstore.transactions t=2023-06-21T07:58:09.31421766Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=sqlstore.transactions t=2023-06-21T07:58:10.43366506Z level=info msg=“Database locked, sleeping then retrying” error=“database is locked” retry=1 code=“database is locked”
logger=context userId=0 orgId=1 uname= t=2023-06-21T07:58:10.433733437Z level=warn msg=“Failed to tag anonymous session” error=”[sqlstore.max-retries-reached] retry 1: database is locked"
logger=provisioning.dashboard t=2023-06-21T07:57:42.250873703Z level=error msg=“can’t read dashboard provisioning files from directory” path=/etc/grafana/provisioning/dashboards error=“open /etc/grafana/provisioning/dashboards: no such file or directory”

prempalsingh · June 21, 2023, 2:12pm

Following is Persistent volume file

kind: PersistentVolume
apiVersion: v1
metadata:
name: grafana-oss-pv
uid: bcb2b16c-1371-4ee8-af4c-2c8750c1166f
resourceVersion: ‘782820495’
creationTimestamp: ‘2023-06-06T12:23:06Z’
labels:
app: grafana
annotations:
pv.kubernetes.io/bound-by-controller: ‘yes’
finalizers:
- kubernetes.io/pv-protection
managedFields:
- manager: Mozilla
operation: Update
apiVersion: v1
time: ‘2023-06-06T12:23:06Z’
fieldsType: FieldsV1
fieldsV1:
‘f:metadata’:
‘f:labels’:
.: {}
‘f:app’: {}
‘f:spec’:
‘f:accessModes’: {}
‘f:capacity’:
.: {}
‘f:storage’: {}
‘f:nfs’:
.: {}
‘f:path’: {}
‘f:server’: {}
‘f:persistentVolumeReclaimPolicy’: {}
‘f:storageClassName’: {}
‘f:volumeMode’: {}
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: ‘2023-06-06T12:23:36Z’
fieldsType: FieldsV1
fieldsV1:
‘f:metadata’:
‘f:annotations’:
.: {}
‘f:pv.kubernetes.io/bound-by-controller’: {}
‘f:spec’:
‘f:claimRef’:
.: {}
‘f:apiVersion’: {}
‘f:kind’: {}
‘f:name’: {}
‘f:namespace’: {}
‘f:resourceVersion’: {}
‘f:uid’: {}
‘f:status’:
‘f:phase’: {}
spec:
capacity:
storage: 50Gi
nfs:
server:
path: /nfs-storage/grafana-oss-pv
accessModes:
- ReadWriteOnce
claimRef:
kind: PersistentVolumeClaim
namespace: monitoring
name: grafana-oss-pvc
uid: 4fcda229-6638-43e0-827e-57fb91a077dd
apiVersion: v1
resourceVersion: ‘782820475’
persistentVolumeReclaimPolicy: Retain
storageClassName: thin
volumeMode: Filesystem
status:
phase: Bound

milan0x4d · August 4, 2023, 3:20pm

Hi, I experience the same issue (Grafana v.9.4.7).

Often, between restarts, the ‘notification policies’ and the ‘contact points’ are lost.

Here is one interesting thing that my friend noticed: These two elemenets are the only two elements that we’ve seen so far that have no internal ‘id’. Every other element will have an ‘id’ field.

Could this have something to do with the loss of these elements after restarts?

For the reference, here is my contact-point (exported via API):

milan0x4d · August 4, 2023, 3:21pm

[
{
“uid”: “gAleuU3Vk”,
“name”: “My test e-mail”,
“type”: “email”,
“settings”: {
“addresses”: “redacted\n”,
“singleEmail”: false
},
“disableResolveMessage”: false
},
{
“uid”: “g59BXUqVk”,
“name”: “WebHook_1 Forwarder”,
“type”: “webhook”,
“settings”: {
“url”: “http://127.0.0.1:5001”
},
“disableResolveMessage”: false
},
{
“uid”: “fdrfu83Vk”,
“name”: “email receiver”,
“type”: “email”,
“settings”: {
“addresses”: “example@email.com”
},
“disableResolveMessage”: false
}
]

milan0x4d · August 4, 2023, 3:22pm

And here is my notification policy, also exported via API:

{
“receiver”: “grafana-default-email”,
“group_by”: [
“grafana_folder”,
“alertname”
],
“routes”: [
{
“receiver”: “WebHook_1 Forwarder”,
“group_by”: [
“…”
],
“object_matchers”: [
[
“notification”,
“=”,
“production”
]
]
}
]
}

Notice how neither CP nor NP have the “id” field? And in case of notification policies, they are also lacking the “uid” field too.

Topic		Replies	Views
Grafana Database locked alerting meaning Alerting alerting , configuration , unified-alerting	7	6278	April 11, 2023
Contact points and Notification policies are gone Alerting alerting , unified-alerting , alert-notifications	2	1408	August 3, 2022
Tutorial needs updating for notification channels Alerting alerting , alert-notifications	2	705	June 3, 2022
Usage Problem of Grafana unified alerting Configuration alerting , configuration	0	1406	December 15, 2021
Communication between labels and notification policies Alerting alerting	6	758	March 22, 2023

Missing Contact point and Notification policies

Related topics