Hi, I installed the helm package of grafana and loki with fluentbit and prometheus like this:
helm upgrade --install loki --namespace=loki grafana/loki-stack --set fluent-bit.enabled=true,promtail.enabled=false,grafana.enabled=true,prometheus.enabled=true,prometheus.alertmanager.persistentVolume.enabled=false,prometheus.server.persistentVolume.enabled=false
exactly as described here:
https://grafana.com/docs/loki/latest/installation/helm/
I’m running 2 linux nodes and 2 windows nodes (t5.large).
the linux nodes look fine, but the windows nodes apparently don’t start the pods correctly.
I do have other windows pods running on this without these weird problems.
kubectl get all -n loki
Summary
NAME READY STATUS RESTARTS AGE
pod/loki-0 0/1 ContainerCreating 0 119m
pod/loki-fluent-bit-loki-4pvkt 0/1 ContainerCreating 0 119m
pod/loki-fluent-bit-loki-6xjcn 0/1 ContainerCreating 0 119m
pod/loki-fluent-bit-loki-7fjvx 1/1 Running 0 119m
pod/loki-fluent-bit-loki-tfst9 1/1 Running 0 119m
pod/loki-grafana-69f5954bc9-6dmlg 0/1 Init:0/1 0 119m
pod/loki-kube-state-metrics-6c7c68c46-kh8pm 0/1 ContainerCreating 0 119m
pod/loki-prometheus-alertmanager-86469c7fd8-rt9bl 0/2 ContainerCreating 0 119m
pod/loki-prometheus-node-exporter-mgxvz 0/1 ContainerCreating 0 119m
pod/loki-prometheus-node-exporter-qf6m9 0/1 ContainerCreating 0 119m
pod/loki-prometheus-node-exporter-vztkr 1/1 Running 0 119m
pod/loki-prometheus-node-exporter-xwctw 1/1 Running 0 119m
pod/loki-prometheus-pushgateway-f8d8f7945-26hsz 0/1 ContainerCreating 0 119m
pod/loki-prometheus-server-64f746787f-f6rwj 0/2 ContainerCreating 0 119m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/loki ClusterIP ***.**.***.*** <none> 3100/TCP 119m
service/loki-grafana ClusterIP ***.**.***.* <none> 80/TCP 119m
service/loki-headless ClusterIP None <none> 3100/TCP 119m
service/loki-kube-state-metrics ClusterIP ***.**.***.** <none> 8080/TCP 119m
service/loki-prometheus-alertmanager ClusterIP ***.**.***.** <none> 80/TCP 119m
service/loki-prometheus-node-exporter ClusterIP None <none> 9100/TCP 119m
service/loki-prometheus-pushgateway ClusterIP ***.**.**.*** <none> 9091/TCP 119m
service/loki-prometheus-server ClusterIP ***.**.***.*** <none> 80/TCP 119m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/loki-fluent-bit-loki 4 4 2 4 2 <none> 119m
daemonset.apps/loki-prometheus-node-exporter 4 4 2 4 2 <none> 119m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/loki-grafana 0/1 1 0 119m
deployment.apps/loki-kube-state-metrics 0/1 1 0 119m
deployment.apps/loki-prometheus-alertmanager 0/1 1 0 119m
deployment.apps/loki-prometheus-pushgateway 0/1 1 0 119m
deployment.apps/loki-prometheus-server 0/1 1 0 119m
NAME DESIRED CURRENT READY AGE
replicaset.apps/loki-grafana-69f5954bc9 1 1 0 119m
replicaset.apps/loki-kube-state-metrics-6c7c68c46 1 1 0 119m
replicaset.apps/loki-prometheus-alertmanager-86469c7fd8 1 1 0 119m
replicaset.apps/loki-prometheus-pushgateway-f8d8f7945 1 1 0 119m
replicaset.apps/loki-prometheus-server-64f746787f 1 1 0 119m
NAME READY AGE
statefulset.apps/loki 0/1 119m
pod events:
Summary
kubectl get events -n loki --field-selector involvedObject.name=loki-grafana-69f5954bc9-6dmlg
LAST SEEN TYPE REASON OBJECT MESSAGE
7m58s Normal SandboxChanged pod/loki-grafana-69f5954bc9-6dmlg Pod sandbox changed, it will be killed and re-created.
2m58s Warning FailedCreatePodSandBox pod/loki-grafana-69f5954bc9-6dmlg (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "loki-grafana-69f5954bc9-6dmlg": Error response from daemon: container 53b86be0efeeea049f30dbc4cea44af1589dea15cdf437f1400a87e7420fdae7 encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) extra info: {"CommandLine":"cmd /S /C \"PAUSE \u003cCON\"","User":"472:472","WorkingDirectory":"C:\\","CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
kubectl get events -n loki --field-selector involvedObject.name=loki-fluent-bit-loki-4pvkt
LAST SEEN TYPE REASON OBJECT MESSAGE
3m55s Normal SandboxChanged pod/loki-fluent-bit-loki-4pvkt Pod sandbox changed, it will be killed and re-created.
13m Warning FailedCreatePodSandBox pod/loki-fluent-bit-loki-4pvkt (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8700f308360c2198d27e17c8c956c7cb02f02aaeb87a61544f38c64d7fa0c8a5" network for pod "loki-fluent-bit-loki-4pvkt": networkPlugin cni failed to set up pod "loki-fluent-bit-loki-4pvkt_loki" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address
kubectl get events -n loki --field-selector involvedObject.name=loki-grafana-69f5954bc9-6dmlg
LAST SEEN TYPE REASON OBJECT MESSAGE
9m20s Normal SandboxChanged pod/loki-grafana-69f5954bc9-6dmlg Pod sandbox changed, it will be killed and re-created.
4m20s Warning FailedCreatePodSandBox pod/loki-grafana-69f5954bc9-6dmlg (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "loki-grafana-69f5954bc9-6dmlg": Error response from daemon: container 53b86be0efeeea049f30dbc4cea44af1589dea15cdf437f1400a87e7420fdae7 encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) extra info: {"CommandLine":"cmd /S /C \"PAUSE \u003cCON\"","User":"472:472","WorkingDirectory":"C:\\","CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
kubectl get events -n loki --field-selector involvedObject.name=loki-kube-state-metrics-6c7c68c46-kh8pm
LAST SEEN TYPE REASON OBJECT MESSAGE
4m47s Normal SandboxChanged pod/loki-kube-state-metrics-6c7c68c46-kh8pm Pod sandbox changed, it will be killed and re-created.
9m49s Warning FailedCreatePodSandBox pod/loki-kube-state-metrics-6c7c68c46-kh8pm (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "loki-kube-state-metrics-6c7c68c46-kh8pm": Error response from daemon: container 05c42449a3c51f3c2704984f6bccded64b01b434ec9bc9b3f431e5c9a58c708e encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) extra info: {"CommandLine":"cmd /S /C \"PAUSE \u003cCON\"","User":"65534:65534","WorkingDirectory":"C:\\","CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
kubectl get events -n loki --field-selector involvedObject.name=loki-prometheus-alertmanager-86469c7fd8-rt9bl
LAST SEEN TYPE REASON OBJECT MESSAGE
5m15s Normal SandboxChanged pod/loki-prometheus-alertmanager-86469c7fd8-rt9bl Pod sandbox changed, it will be killed and re-created.
14s Warning FailedCreatePodSandBox pod/loki-prometheus-alertmanager-86469c7fd8-rt9bl (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "loki-prometheus-alertmanager-86469c7fd8-rt9bl": Error response from daemon: container 5083cb0b49515b42734b165cc6f91101f409f8163e2abb3c5d9c6e95e76a3833 encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) extra info: {"CommandLine":"cmd /S /C \"PAUSE \u003cCON\"","User":"65534:65534","WorkingDirectory":"C:\\","CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
kubectl get events -n loki --field-selector involvedObject.name=loki-prometheus-node-exporter-mgxvz
LAST SEEN TYPE REASON OBJECT MESSAGE
5m42s Warning FailedCreatePodSandBox pod/loki-prometheus-node-exporter-mgxvz Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "loki-prometheus-node-exporter-mgxvz": Error response from daemon: network host not found
43s Normal SandboxChanged pod/loki-prometheus-node-exporter-mgxvz Pod sandbox changed, it will be killed and re-created.
kubectl get events -n loki --field-selector involvedObject.name=loki-prometheus-pushgateway-f8d8f7945-26hsz
LAST SEEN TYPE REASON OBJECT MESSAGE
66s Normal SandboxChanged pod/loki-prometheus-pushgateway-f8d8f7945-26hsz Pod sandbox changed, it will be killed and re-created.
6m8s Warning FailedCreatePodSandBox pod/loki-prometheus-pushgateway-f8d8f7945-26hsz (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "loki-prometheus-pushgateway-f8d8f7945-26hsz": Error response from daemon: container b9122521e6e5febb5f00f589b9ba158e76a0d8f5d4b10191846afadb138824ae encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) extra info: {"CommandLine":"cmd /S /C \"PAUSE \u003cCON\"","User":"65534","WorkingDirectory":"C:\\","CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
kubectl get events -n loki --field-selector involvedObject.name=loki-prometheus-server-64f746787f-f6rwj
LAST SEEN TYPE REASON OBJECT MESSAGE
86s Normal SandboxChanged pod/loki-prometheus-server-64f746787f-f6rwj Pod sandbox changed, it will be killed and re-created.
6m28s Warning FailedCreatePodSandBox pod/loki-prometheus-server-64f746787f-f6rwj (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "loki-prometheus-server-64f746787f-f6rwj": Error response from daemon: container c7078a4d99e14680e9a791e08953bf812a706c6e38d8b51fc592510db1461490 encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) extra info: {"CommandLine":"cmd /S /C \"PAUSE \u003cCON\"","User":"65534:65534","WorkingDirectory":"C:\\","CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
Is this just something that is not at all supported by helm and I need to manually set everything up?
Or can I somehow get this working properly because I’m only missing config in my cluster?
I would appreciate any help or pointers since I’m very new to windows in kubernetes.
edit: formatting