Problems retrieving Loki logs in Grafana on Openshift

Hi

Not sure if this is the right forum, but given it relates to the user guides for the Loki operator, I will give it a try.

I have been trying to setup the Loki datasource in grafana on Openshift (version 4.12) by following the examples found here: Connect Grafana to an in-cluster LokiStack - Loki Operator.
(there is a dead link in the documentation, but the yaml file can be found here: loki/addon_grafana_gateway_ocp_oauth.yaml at main · grafana/loki · GitHub)

This works very well for the version of Grafana given in the yaml file (image: docker.io/grafana/grafana:8.5.6), but I have not been able to make it work with any 9.x version of Grafana, so something seems to break between the two major versions.

With version 9.5.3 of Grafana, I am able to retrieve the labels, when I explore the Loki datasource in Grafana, but no data is shown.

I have looked through the release notes for Grafana and enabled more logging in Grafana without being able to figure out what goes wrong. That said I’m both a beginner in regard to Loki and Grafana, so I might have missed something.

Has anybody managed to get it working with a 9.x version of Grafana?

If so any pointers that could lead to a solution would be appreciated.

That sounds rather strange. If you are able to get labels then the data source is working. Do you have some screenshots perhaps?

Hi tonyswumac

Thanks for the reply. Here are some screenshots. The two first is by using Grafana 9.5.3 where the labels are retrieved but no data was found. The last one is using Grafana 8.5.6 (the image from the example), and here the same query returns data.



Hi Again

I noticed that if I press the “live” button I get data (don’t know why I haven’t done that before). It therefore seems to be a problem related to the time range when I do a search.

What’s the version of Loki?

If you suspect time range being the problem, have you tried specifying different time range and see if you get data? When you press “live”, what’s the latest timestamp on the last log? Does new log continue to come in?

Thanks for the reply.

I tried changing the time range, without any luck (including a time range a day in the past to see if its the time offset, that are causing the problem) . I still don’t get data this way.

The entries retrieved when doing live looks fine. At 7:42 local time I got the following entry:

2023-06-13 07:42:29 {“@timestamp”:“2023-06-13T05:42:28.624755492Z”,“file”:“/var/log/pods/infrastructure-tooling_cluster-configurator-bff-7f75d47cd5-pvmd5_1e38df5a-85c9-44fc-ad4d-aeb931e49b50/cluster-configurator-bff/0.log”, …

My time is as you can see offset 2 hours from UTC/Zulu time. The entries keeps comming as they should.

What’s the version of Loki you are running?

We updated to the lastest version of Red Hats Loki Operator yesterday (v5.7.2), which didn’t make a difference

As far as we can see of the images, this should result in Loki version 2.8.x being installed at the cluster. Given that we are not sure we have asked Red Hat. I will get back when we have an answer from Red Hat.

Couple of things to try in the mean time I can think of:

  1. Try dong an API call directly to Loki endpoint as a sanity check.
  2. Create a new data source for the same Loki endpoint in Grafana 9 as sanity check.
  3. Are you using any proxy in front of Loki? If so I’d check the logs to see if anything is being routed incorrectly.

Can you share a screenshot of what your data source looks like?

Hi again

Direct API call:

We tried doing direct api calls, and it seems to work. Only difference in our api calls and the way grafana do them, are a) that we go through an Openshift route and b) that I authenticate using a token while Grafana currently is configured to use authentication with CA cert (see screenshot below). Given that we get the labels. and things work in Grafana 8.5.6, this is what I would have expected. For example we tried:

https://logging-loki-openshift-logging.apps.c03x.paas.corp.jyskebank.net/api/logs/v1/application/loki/api/v1/query_range?query={+log_type%3D"application"+}+|+json&start=1686811511525000000&end=1686815111525000000&limit=100&direction=backward

New datasource

I create the datasource using datasource provisioning (as done in the example in the Loki Operator documentation (see link in post above)). My current datasource definition looks like this:

    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
      access: proxy
      basicAuth: false
      withCredentials: false
      isDefault: true
      jsonData:
        timeInterval: 5s
        tlsSkipVerify: true
        httpHeaderName1: "Authorization"
      secureJsonData:
        httpHeaderValue1: "Bearer ${PROMETHEUS_ACCESS_TOKEN}"
      editable: false
    - name: Loki - Application
      isDefault: false
      type: loki
      access: proxy
      url: https://${GATEWAY_ADDRESS}/api/logs/v1/application/
      jsonData:
        tlsAuthWithCACert: true
      secureJsonData:
        tlsCACert: ${GATEWAY_SERVICE_CA}

The environment variable is set in the deployment. Here I made a discovery. At some point in trying to get things to work, I have added an extra certificate to this environment variable. I did this because I got the following error while retrieving data (but not when retrieving labels):

Get “https://oauth-openshift.apps.c03x.paas.corp.jyskebank.net/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Aopenshift-logging%3Alogging-loki-gateway&redirect_uri=https%3A%2F%2Flogging-loki-openshift-logging.apps.c03x.paas.corp.jyskebank.net%2Fopenshift%2Fapplication%2Fcallback%3Froute%3D%2Floki%2Fapi%2Fv1%2Fquery_range&response_type=code&scope=user%3Ainfo+user%3Acheck-access+user%3Alist-projects&state=I+love+Observatorium”: tls: failed to verify certificate: x509: certificate signed by unknown authority

It seems that a oauth flow is triggered when retrieving data. The strange thing is that the CA, for the certificate that fails, is present among the containers CA’s. As I understand it go should use these certificates.

In the 8.5.6 container it hasn’t been necessary to ad the extra certificate. Could my problems be related to this?

To be honest I have had trouble finding good detailed information regarding the different authentication methods in the datasources, so any link would be appreciated.

Proxy
We are accessing loki through a service internally on the Openshift cluster, so there shouldn’t be any proxy. But given the error above I suspect that Red Hat might be using Observatorium.

Screenshot of the datasource:

Do you know if there are any way to configure Grafana to logout all the request that the datasources do? I have already enable datasource logging, and plugin login and changed the loglevel to trace.

You can check in your browser’s developer tool (for Chrome under the network section), sometimes it gives you some information.

I haven’t used Loki on openshift before, but if you were able to authenticate to Loki using a token, can you try to configure the data source using a token as well?

I was more after more logging of the requests between the Grafana backend and loki backend.

I tried using the token as you suggested (configured as the Prometheus datasource in my previous post). This works and I get data. My problem therefore seems to be related to authentication or authorization when retrieving data from the backend (strange that I can get the labels).

Unfortunately I can’t configure the datasource with a fixed token, given I need the multi-tenancy that should be provided using the other method. But at least we now know what the problem relates to, which in itself is progress.

Yeah, might want to ask someone with more experience on Openshift. Could also be a question for the Grafana forum since it’s less likely to be directly related to Loki.

Hi tonyswumac

Thanks for trying to help - it’s appreciated.

I have gotten a lead through another channel and are working on it now. The trick seems to be to configure grafana to use the Openshift oauth server (without using the proxy), and set the oauthPassThru property in the datasource definition.

1 Like

Hi erikjb,

did you make some progress here with oauth as you hinted in your last update?
I myself am struggling with similar issues on Openshift 4.10, connecting an external Grafana to Openshift logging / Loki.

Thanks for your reply,
Thomas

Hi everyone,

So I am dealing with the same issue and managed to get this to work with a workaround and I am in process of searching a better approach. I am new to Loki and I am trying to first get it to work at least somehow and deal with security later.

So the main issues that I have encountered are:

  • Openshift Loki has authentication enabled (client cert required)
  • logs are split into 3 “organizations”: application, audit, infrastructure

I managed to get it to work by stealing client certificate that is presumably used by Openshift Console to receive logs:

Datasource:
URL: https://logging-loki-query-frontend-http.openshift-logging:3100
TLS Client Auth: enabled
Skip TLS Verify: enabled
TLS/SSL auth details: copied from openshift-logging/logging-loki-gateway-client-http secret
Custom HTTP headers: X-Scope-OrgID: application (or infrastructure/audit)

This is far from ideal but might help some of you kickstart a better solution. I want to avoid having to use TLS client auth and I can’t really do this the gitops approach + I am worried these keys get rotated by operator.

Openshift: 4.13
Grafana: 9.1.6 (community)
Loki Operator: 5.7.6 (redhat)

Hi davtex,
thx for updating with your findings, and great that it works for you in some way.
Thats one way of getting it to work with Grafana > v8.5, but I think that way are losing the application/namespace authorization feature, as you are skipping the gateway pod and connect directly to query-frontend pod.
Meaning, a user can see all applications logs, theres no restriction on which namespace he is permitted to access by RBACs, right?
If thats works for you, great, but I really need to restrict app logs access based on RBAC permissions.
At least now I know for sure that the rather simple & easy setup using Openshift’s oauth-proxy and Grafana’s auth-proxy authentication feature stopped working in Grafana versions >= 9.0, and I am still investigating using a Grafana oauth configuration using Keycloak.

Openshift: 4.12
Loki Operator: 5.7.6 (redhat)
Grafana Operator: 4.10.1 using different Grafana images: 8.5.27, 9.5.6, 10.0.2

Hi thikade

Sorry for the late answer, but I had to make sure that I could publish the following (I got it from Red Hat - a big thanks to them for providing it and letting me publish it). It helped me a lot, and based on this I got it working:

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    serviceaccounts.openshift.io/oauth-redirectreference.grafana: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"grafana"}}'
  name: grafana
  namespace: openshift-monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: logging-application-logs-reader
rules:
- apiGroups:
  - loki.grafana.com
  resourceNames:
  - logs
  resources:
  - application
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: logging-grafana-alertmanager-access
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
- apiGroups:
  - monitoring.coreos.com
  resourceNames:
  - non-existant
  resources:
  - alertmanagers
  verbs:
  - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app.kubernetes.io/part-of: openshift-monitoring
  name: logging-grafana-users-alertmanager-access
  namespace: openshift-monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: monitoring-alertmanager-edit
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated:oauth
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: logging-grafana-alertmanager-access
  namespace: openshift-monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: logging-grafana-alertmanager-access
subjects:
- kind: ServiceAccount
  name: grafana
  namespace: openshift-monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: logging-grafana-auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: grafana
  namespace: openshift-monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: logging-grafana-metrics-view
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-monitoring-view
subjects:
- kind: ServiceAccount
  name: grafana
  namespace: openshift-monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: logging-users-application-logs-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: logging-application-logs-reader
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
---
apiVersion: v1
data:
  config.ini: |
    [analytics]
    check_for_updates = false
    reporting_enabled = false
    [auth]
    disable_login_form = true
    [auth.basic]
    enabled = false
    [auth.generic_oauth]
    name = OpenShift
    icon = signin
    enabled = true
    client_id = system:serviceaccount:openshift-monitoring:grafana
    client_secret = ${OAUTH_CLIENT_SECRET}
    scopes = user:info user:check-access user:list-projects role:logging-grafana-alertmanager-access:openshift-monitoring
    empty_scopes = false
    auth_url = https://oauth-openshift.apps.${CLUSTER_ROUTES_BASE}/oauth/authorize
    token_url = https://oauth-openshift.apps.${CLUSTER_ROUTES_BASE}/oauth/token
    api_url = https://kubernetes.default.svc/apis/user.openshift.io/v1/users/~
    email_attribute_path = metadata.name
    allow_sign_up = true
    allow_assign_grafana_admin = true
    role_attribute_path = contains(groups[*], 'system:cluster-admins') && 'GrafanaAdmin' || contains(groups[*], 'cluster-admin') && 'GrafanaAdmin'  || contains(groups[*], 'dedicated-admin') && 'GrafanaAdmin' || 'Viewer'
    tls_client_cert = /etc/tls/private/tls.crt
    tls_client_key = /etc/tls/private/tls.key
    tls_client_ca = /run/secrets/kubernetes.io/serviceaccount/ca.crt
    use_pkce = true
    [paths]
    data = /var/lib/grafana
    logs = /var/lib/grafana/logs
    plugins = /var/lib/grafana/plugins
    provisioning = /etc/grafana/provisioning
    [security]
    admin_user = system:does-not-exist
    cookie_secure = true
    [server]
    protocol = https
    cert_file = /etc/tls/private/tls.crt
    cert_key = /etc/tls/private/tls.key
    root_url = https://grafana-openshift-monitoring.apps.${CLUSTER_ROUTES_BASE}/
    [users]
    viewers_can_edit = true
    default_theme = light
    [log]
    mode = console
    level = info
    [dataproxy]
    logging = true
kind: ConfigMap
metadata:
  name: grafana-config-455kdg4tgt
  namespace: openshift-monitoring
---
apiVersion: v1
data:
  providers.yaml: |
    apiVersion: 1

    providers:
    - name: 'openshift-logging-dashboards'
      orgId: 1
      folder: 'OpenShift Logging'
      folderUid: '990e03fc-b278-4b16-8fd6-34d381c22338'
      type: file
      disableDeletion: false
      updateIntervalSeconds: 10
      allowUiUpdates: false
      options:
        path: /var/lib/grafana/dashboards
        foldersFromFilesStructure: false
kind: ConfigMap
metadata:
  name: grafana-dashboards-f8c5mkfkhd
  namespace: openshift-monitoring
---
apiVersion: v1
data:
  datasources.yaml: |
    apiVersion: 1
    datasources:
      - access: proxy
        editable: true
        jsonData:
          tlsAuthWithCACert: true
          timeInterval: 5s
          oauthPassThru: true
          manageAlerts: true
          alertmanagerUid: 8e7816ff-6815-4a38-95f4-370485165c5e
        secureJsonData:
          tlsCACert: ${GATEWAY_SERVICE_CA}
        name: Prometheus
        uid: 73a57e8b-7679-4a18-915c-292f143448c7
        type: prometheus
        url: https://${CLUSTER_MONITORING_THANOS_QUERIER_OAUTH_ADDRESS}
      - name: Loki (Application)
        uid: 4b4e7fa0-9846-4a8a-9ab3-f09b21e777c8
        isDefault: true
        type: loki
        access: proxy
        url: https://${GATEWAY_ADDRESS}/api/logs/v1/application/
        jsonData:
          tlsAuthWithCACert: true
          oauthPassThru: true
          manageAlerts: true
          alertmanagerUid: 8e7816ff-6815-4a38-95f4-370485165c5e
        secureJsonData:
          tlsCACert: ${GATEWAY_SERVICE_CA}
      - name: Loki (Infrastructure)
        uid: 306ba00d-0435-4ee5-99a2-681f81b3e338
        type: loki
        access: proxy
        url: https://${GATEWAY_ADDRESS}/api/logs/v1/infrastructure/
        jsonData:
          tlsAuthWithCACert: true
          oauthPassThru: true
          manageAlerts: true
          alertmanagerUid: 8e7816ff-6815-4a38-95f4-370485165c5e
        secureJsonData:
          tlsCACert: ${GATEWAY_SERVICE_CA}
      - name: Loki (Audit)
        uid: b1688386-b1df-4492-88ba-a9ceb75f295a
        type: loki
        access: proxy
        url: https://${GATEWAY_ADDRESS}/api/logs/v1/audit/
        jsonData:
          tlsAuthWithCACert: true
          oauthPassThru: true
          manageAlerts: true
          alertmanagerUid: 8e7816ff-6815-4a38-95f4-370485165c5e
        secureJsonData:
          tlsCACert: ${GATEWAY_SERVICE_CA}
      - name: Alertmanager
        type: alertmanager
        url: https://${CLUSTER_MONITORING_ALERTMANAGER_ADDRESS}
        access: proxy
        uid: 8e7816ff-6815-4a38-95f4-370485165c5e
        jsonData:
          # Valid options for implementation include mimir, cortex and prometheus
          implementation: prometheus
          tlsAuthWithCACert: true
          oauthPassThru: true
          handleGrafanaManagedAlerts: true
        secureJsonData:
          tlsCACert: ${GATEWAY_SERVICE_CA}
kind: ConfigMap
metadata:
  name: grafana-datasources-8tfkb28kfd
  namespace: openshift-monitoring
---
apiVersion: v1
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: grafana
  name: grafana-token
  namespace: openshift-monitoring
type: kubernetes.io/service-account-token
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: grafana-tls
  labels:
    app: grafana
  name: grafana
  namespace: openshift-monitoring
spec:
  ports:
  - name: http-grafana
    port: 3000
    protocol: TCP
    targetPort: http-grafana
  selector:
    app: grafana
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: openshift-monitoring
spec:
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - args:
        - -config=/etc/grafana/config.ini
        env:
        - name: OAUTH_CLIENT_SECRET
          valueFrom:
            secretKeyRef:
              key: token
              name: grafana-token
        - name: CLUSTER_ROUTES_BASE
          value: ptsiraki-log230614.devcluster.openshift.com
        - name: GATEWAY_SERVICE_CA
          valueFrom:
            configMapKeyRef:
              key: service-ca.crt
              name: openshift-service-ca.crt
        - name: GATEWAY_ADDRESS
          value: lokistack-dev-gateway-http.openshift-logging.svc:8080
        - name: CLUSTER_MONITORING_THANOS_QUERIER_OAUTH_ADDRESS
          value: thanos-querier.openshift-monitoring.svc.cluster.local:9091/
        - name: CLUSTER_MONITORING_ALERTMANAGER_ADDRESS
          value: alertmanager-main.openshift-monitoring.svc:9094
        image: docker.io/grafana/grafana:9.5.2
        imagePullPolicy: IfNotPresent
        name: grafana
        ports:
        - containerPort: 3000
          name: http-grafana
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /robots.txt
            port: 3000
            scheme: HTTPS
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        resources:
          limits:
            cpu: 1000m
            memory: 256Mi
          requests:
            cpu: 250m
            memory: 256Mi
        volumeMounts:
        - mountPath: /etc/grafana
          name: grafana-config
        - mountPath: /etc/tls/private
          name: secret-grafana-tls
        - mountPath: /var/lib/grafana/dashboards
          name: grafana-dashboards-configs
        - mountPath: /var/lib/grafana
          name: grafana
        - mountPath: /etc/grafana/provisioning/datasources
          name: grafana-datasources
        - mountPath: /etc/grafana/provisioning/dashboards
          name: grafana-dashboards
      serviceAccountName: grafana
      volumes:
      - configMap:
          name: grafana-config-455kdg4tgt
        name: grafana-config
      - name: secret-grafana-tls
        secret:
          defaultMode: 420
          secretName: grafana-tls
      - name: grafana-dashboards-configs
        projected:
          sources:
          - configMap:
              name: grafana-dashboard-lokistack-chunks
              optional: true
          - configMap:
              name: grafana-dashboard-lokistack-reads
              optional: true
          - configMap:
              name: grafana-dashboard-lokistack-retention
              optional: true
          - configMap:
              name: grafana-dashboard-lokistack-writes
              optional: true
      - configMap:
          name: grafana-datasources-8tfkb28kfd
        name: grafana-datasources
      - configMap:
          name: grafana-dashboards-f8c5mkfkhd
        name: grafana-dashboards
      - emptyDir: {}
        name: grafana
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: grafana
  namespace: openshift-monitoring
spec:
  port:
    targetPort: http-grafana
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: reencrypt
  to:
    kind: Service
    name: grafana
    weight: 100
  wildcardPolicy: None

I have structured my configuration a bit different, but all the difficult stuff is in this example.

A single detail though. To make it work with Grafana 10 I had to set the environment variable:

GF_AUTH_OAUTH_ALLOW_INSECURE_EMAIL_LOOKUP

to true.

This was introduced in when CVE-2023-3128 was fixed.

Feel free to ask if you have further questions.

Hey erikjb,

Thanks a lot for coming back and sharing that information!
I haven’t had time to test this out yet, but was reviewing the manifests you shared. Seems like all the magic is now done through the Grafana oauth-client configuration!
One thing that got me confused at first was the grafana.ini field: “role_attribute_path”:
And it seems that this is a JMESpath query to express: IF user is member of “cluster-admin” (or dedicated-admin) group, THEN he will become GrafanaAdmin. While all others get the “Viewer” role in Grafana. Pretty cool!

Thanks again,
Thomas

Hi erikjb,

I have now successfully implemented and tested it using Grafana 10.1 and and have to say it works beautifully! Loki access & authorization based on the user account is working like a charm.
Prometheus/Thanos access did not work for me following you recicpe, as I did not want to install everything in openshift-monitoring namespace, so I switched that configuration back to having the Prom datasource use the Grafana SA token, which works equally fine (and I am currently not interested in the alertmanager stuff that you included).

I believe Prometheus & Alertmanager access was the reason you have to install in openshift-monitoring namespace!?
That way you could enrich the oauth-token scope via the role:logging-grafana-alertmanager-access:openshift-monitoring config.
Is that correct?

Thank you!