Debuggin loki pod CrashLoopBackoff

Hi, I a k8s cluster deployed with loki-stack Ihave a loki server that worked well until the volume filled (100%). It started eating RAM. After enlarging the volume it showed permission problems that I fixed with chown/chmod.

Now I can’t start it, it seems it does not pass the liveness test (but honestly I can’t see anymore that warning that I saw in the morning). I just get

# kubectl describe -n monitoring pod/loki-0
Name:         loki-0
Namespace:    monitoring
    Container ID:  containerd://b310f8f6edf97de394424ba21c905340e972013a1b3324b67854ce633c6a2efe
    Image:         grafana/loki:2.5.0
    Image ID:
    Ports:         3100/TCP, 9095/TCP, 7946/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 06 Oct 2022 13:47:45 +0000
      Finished:     Thu, 06 Oct 2022 13:48:55 +0000
    Ready:          False
    Restart Count:  73
    Liveness:       http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
  Type     Reason   Age                      From     Message
  ----     ------   ----                     ----     -------
  Warning  BackOff  4m32s (x816 over 4h12m)  kubelet  Back-off restarting failed container

and logs report:

# kubectl logs -f -n monitoring loki-0
level=info ts=2022-10-06T13:53:59.136420745Z caller=main.go:106 msg="Starting Loki" version="(version=2.5.0, branch=HEAD, revision=2d9d0ee23)"
level=info ts=2022-10-06T13:53:59.136966393Z caller=server.go:260 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2022-10-06T13:53:59.137071848Z caller=modules.go:597 msg="RulerStorage is not configured in single binary mode and will not be started."
level=info ts=2022-10-06T13:53:59.137615082Z caller=memberlist_client.go:394 msg="Using memberlist cluster node name" name=loki-0-f8157810
level=info ts=2022-10-06T13:53:59.14276134Z caller=memberlist_client.go:513 msg="joined memberlist cluster" reached_nodes=1
level=warn ts=2022-10-06T13:53:59.144747503Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache"
level=info ts=2022-10-06T13:53:59.181752254Z caller=table_manager.go:239 msg="loading table index_19256"
level=info ts=2022-10-06T13:55:05.103199819Z caller=table.go:443 msg="cleaning up unwanted dbs from table index_19267"
level=info ts=2022-10-06T13:55:05.10380821Z caller=table.go:358 msg="uploading table index_19256"
level=info ts=2022-10-06T13:55:05.411105415Z caller=table.go:385 msg="finished uploading table index_19256"
level=info ts=2022-10-06T13:55:05.411164686Z caller=table.go:443 msg="cleaning up unwanted dbs from table index_19256"
level=info ts=2022-10-06T13:55:05.411244085Z caller=module_service.go:96 msg="module stopped" module=store
level=info ts=2022-10-06T13:55:05.412562643Z caller=modules.go:877 msg="server stopped"
level=info ts=2022-10-06T13:55:05.412613538Z caller=module_service.go:96 msg="module stopped" module=server
level=info ts=2022-10-06T13:55:05.412644661Z caller=loki.go:373 msg="Loki stopped"
level=error ts=2022-10-06T13:55:05.412703241Z caller=log.go:100 msg="error running loki" err="failed services\*Loki).Run\n\t/src/loki/pkg/loki/loki.go:419\nmain.main\n\t/src/loki/cmd/loki/main.go:108\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"

At this point the server restarts…

When I had permission problems logs clearly stated that, what else could it be? How can I debug it?


try setting the loki log level to debug to see if you can get more clues.