Distributed Loki - Troubleshooting Distributor and Ingester on Kubernetes

Hello all,

after getting the everything working with a single node deploy with the help I got in Can't get tracing with Grafana Agent to work

Now I’m trying the fully distributed Loki stack, deploying using grafana/loki-distributed helm chart

All my pods are now up and running, which is a first step :slight_smile:

$ kubectl get pods -n loki
NAME                                               READY   STATUS    RESTARTS   AGE
loki-distributed-compactor-9b98f7f6-kc9zn          1/1     Running   0          46h
loki-distributed-distributor-9c5c686b7-ms9p2       1/1     Running   0          23h
loki-distributed-gateway-6dd5459c95-2674d          1/1     Running   0          2d
loki-distributed-index-gateway-0                   1/1     Running   0          46h
loki-distributed-index-gateway-1                   1/1     Running   0          46h
loki-distributed-ingester-0                        1/1     Running   0          46h
loki-distributed-querier-74c47c4d95-bjgzd          1/1     Running   0          46h
loki-distributed-query-frontend-57789d8bcd-4mw65   1/1     Running   0          46h

I’m using grafana-agent to ship my logs. Based on metrics, messages seem to be sent on. These numbers are increasing.

$ curl -s http://localhost:8080/metrics | grep promtail_sent_
# HELP promtail_sent_bytes_total Number of bytes sent.
# TYPE promtail_sent_bytes_total counter
promtail_sent_bytes_total{host="loki-distributed-distributor.loki:3100"} 27496
# HELP promtail_sent_entries_total Number of log entries sent to the ingester.
# TYPE promtail_sent_entries_total counter
promtail_sent_entries_total{host="loki-distributed-distributor.loki:3100"} 1285

For Distributor and Ingester, what should I be looking for. I don’t see anything too suspicious in logs and no idea what metrics I should be looking out for.

In AWS console, my S3 bucket looks empty. Maybe that is not the way to verify the writes?

All suggestions welcome :smiley:

Here is the full ConfigMap, in case it shows something that is set wrong

$ kubectl describe configmap loki-distributed -n loki
Name:         loki-distributed
Namespace:    loki
Labels:       app.kubernetes.io/instance=loki-distributed
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=loki-distributed
              app.kubernetes.io/version=2.3.0
              helm.sh/chart=loki-distributed-0.37.3
Annotations:  meta.helm.sh/release-name: loki-distributed
              meta.helm.sh/release-namespace: loki

Data
====
config.yaml:
----
auth_enabled: false

server:
  http_listen_port: 3100

distributor:
  ring:
    kvstore:
      store: memberlist

memberlist:
  join_members:
    - loki-distributed-memberlist

ingester:
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  chunk_idle_period: 30m
  chunk_block_size: 262144
  chunk_encoding: snappy
  chunk_retain_period: 1m
  max_transfer_retries: 0

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_cache_freshness_per_query: 10m

schema_config:
  configs:
    - from: 2021-11-05
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /var/loki/index
    cache_location: /var/loki/cache
    shared_store: s3
    cache_ttl: 168h
    index_gateway_client:
      server_address: dns:///loki-distributed-index-gateway:9095

  aws:
    bucketnames: my-s3-bucket-name
    region: eu-west-1
    access_key_id: ${LOKI_S3_ACCESS_KEY_ID}
    secret_access_key: ${LOKI_S3_SECRET_ACCESS_KEY}
    insecure: false
    sse_encryption: false
    http_config:
      idle_conn_timeout: 90s
      response_header_timeout: 0s
      insecure_skip_verify: false
    s3forcepathstyle: true

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

query_range:
  align_queries_with_step: true
  max_retries: 5
  split_queries_by_interval: 15m
  cache_results: true
  results_cache:
    cache:
      enable_fifocache: true
      fifocache:
        max_size_items: 1024
        validity: 24h

frontend_worker:
  frontend_address: loki-distributed-query-frontend:9095

frontend:
  log_queries_longer_than: 5s
  compress_responses: true
  tail_proxy_url: http://loki-distributed-querier:3100

compactor:
  shared_store: s3

ruler:
  storage:
    type: local
    local:
      directory: /etc/loki/rules
  ring:
    kvstore:
      store: memberlist
  rule_path: /tmp/loki/scratch
  alertmanager_url: http://alertmanager-main.system-monitoring:9093

Events:  <none>

Just following up. Looks like I got it working.

Solutions

  • make sure there are no / in the secret access key (seems to maybe be this issue)
  • final working aws.s3: config from this thread.