Loki write pods failing to come up with K8s Argo Helm deploy and Azure storage

pbridge75 · August 31, 2023, 3:16pm

Throwing this error repeatedly on all the Loki write pods on the cluster, and they all show as progressing in argo and are showing as Unhealthy with the message Readiness probe failed: HTTP probe failed with statuscode: 503.

level=error ts=2023-08-31T15:06:02.262372375Z caller=flush.go:144 org_id=1 msg="failed to flush" err="failed to flush chunks: store put chunk: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors, num_chunks: 1, labels: {app=\"metrics-server\", container=\"metrics-server-vpa\", filename=\"/var/log/pods/kube-system_metrics-server-5955767688-jxx2w_461cedbf-3f06-4e90-8b83-ff1808994420/metrics-server-vpa/0.log\", job=\"kube-system/metrics-server\", namespace=\"kube-system\", node_name=\"aks-statsgrdev-27419674-vmss000003\", pod=\"metrics-server-5955767688-jxx2w\", stream=\"stderr\"}"
level=info ts=2023-08-31T15:06:02.262398775Z caller=flush.go:168 msg="flushing stream" user=1 fp=73c7a2b302997943 immediate=true num_chunks=19 labels="{app=\"loki\", component=\"backend\", container=\"loki\", filename=\"/var/log/pods/statsgrafana_loki-backend-2_acaf0b8f-ddee-4a9c-84bc-d1ddd8577f65/loki/0.log\", instance=\"statsgrafana-dev\", job=\"statsgrafana/loki\", namespace=\"statsgrafana\", node_name=\"aks-statsgrdev-27419674-vmss000005\", pod=\"loki-backend-2\", stream=\"stderr\"}"

Here’s the values file being passed to the helm chart:

image:
  pullPolicy: Always
  tags: "dev-latest"
schema_config:
  configs:
  - from: "2022-01-11"
    index:
      period: 24h
      prefix: index_
    object_store: azure
    schema: v12
    store: boltdb-shipper
  azure:
    # Your Azure storage account name
    account_name: <account name>
    # For the account-key, see docs: https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal
    account_key: <account key>
    # See https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers
    container_name: <container name>
    use_managed_identity: false
    # Providing a user assigned ID will override use_managed_identity
    # user_assigned_id: <user-assigned-identity-id>
    request_timeout: 0
    # Configure this if you are using private azure cloud like azure stack hub and will use this endpoint suffix to compose container & blob storage URL. Ex: https://account_name.endpoint_suffix/container_name/blob_name
    endpoint_suffix: blob.core.windows.net
  boltdb_shipper:
    active_index_directory: /data/loki/boltdb-shipper-active
    cache_location: /data/loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: azure
  filesystem:
    directory: /data/loki/chunks

The application yaml from argo is here:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: statsgrafana-app
  namespace: argocd
spec:
  generators: 
  - list:
      elements: 
      - name: statsgrafana-dev
        cluster: statsgr-aks-dev
        valueFiles: values/dev.yaml
        grafanaVer: '6.58.4'
        lokiVer: '5.8.9'
        promtailVer: '6.11.6'     
  template:
    metadata: 
      name: '{{name}}'
    spec:
      project: <my argo project name> # The Argo Project this will be defined under. 
#      syncPolicy:
#        automated: {}      
      sources:
        - repoURL: "https://grafana.github.io/helm-charts" 
          targetRevision: 6.58.4
          chart: grafana
          syncOptions:
          # Need server side apply because the resource is too big to fit in 262144 bytes allows annotation size. May be fixed by a future version of the chart
           - ServerSideApply=true
          helm:
            valueFiles: 
            - $grafanavalues/helm-charts/grafana/{{valueFiles}}
        - repoURL: <my azure repo address> 
          targetRevision: 'main' 
          ref: grafanavalues  
        - repoURL: "https://grafana.github.io/helm-charts" 
          targetRevision: 5.8.9
          chart: loki
          helm: 
            valueFiles:
            - $lokivalues/helm-charts/loki/{{valueFiles}} 
        - repoURL: <my azure repo address> 
          targetRevision: 'main' 
          ref: lokivalues 
        - repoURL: "https://grafana.github.io/helm-charts"
          targetRevision: 6.11.6
          chart: promtail
          helm: 
            valueFiles:
            - $promtailvalues/helm-charts/promtail/{{valueFiles}}  
        - repoURL: <my azure repo address> 
          targetRevision: 'main' 
          ref: promtailvalues 
      destination:
        name: '{{cluster}}'
        #server: https://kubernetes.default.svc 
        namespace: 'statsgrafana'

pbridge75 · August 31, 2023, 8:14pm

Figured it out - the values.yaml being passed to the helm chart needs to have loki ahead of the schema_config section:

image:
  pullPolicy: Always
  tags: "dev-latest"
loki:
  schema_config:
    configs:
    - from: "2022-01-11"
      index:
        period: 24h
        prefix: index_
      object_store: azure
      schema: v12
      store: boltdb-shipper
    azure:
      # Your Azure storage account name
      account_name: <account name>
      # For the account-key, see docs: https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal
      account_key: <account key>
      # See https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers
      container_name: <container name>
      use_managed_identity: false
      # Providing a user assigned ID will override use_managed_identity
      # user_assigned_id: <user-assigned-identity-id>
      request_timeout: 0
      # Configure this if you are using private azure cloud like azure stack hub and will use this endpoint suffix to compose container & blob storage URL. Ex: https://account_name.endpoint_suffix/container_name/blob_name
      endpoint_suffix: blob.core.windows.net
    boltdb_shipper:
      active_index_directory: /data/loki/boltdb-shipper-active
      cache_location: /data/loki/boltdb-shipper-cache
      cache_ttl: 24h
      shared_store: azure
    filesystem:
      directory: /data/loki/chunks

system · August 30, 2024, 8:14pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loki write pods does not get into ready state Grafana Loki	15	2037	November 20, 2024
Having issue with starting the loki write pods Grafana Loki	2	488	May 6, 2024
Loki write pods are not getting ready state Grafana Loki	0	10	November 20, 2024
Loki stuck failing to flush on kubernetes cluster Grafana Loki loki	1	687	March 15, 2024
Loki In Kubernetes - Write goes 100% Error Rate Grafana Loki	4	562	June 8, 2024

Loki write pods failing to come up with K8s Argo Helm deploy and Azure storage

Related topics