Promtail stops tailing file after sending only a only a few entries in k8s

Hi folks, trying to debug a strange problem where my promtail instance is reliably stopping tailing a Kubernetes pod’s logs file beyond a certain point.

I have most logs flowing into a loki from promtail, having installed both with helm charts using the default config values. Logs from kube-system pods and from other long running pods in my cluster seem to be working fine and all are tailed into loki and I’m able to query them via Grafana, but, for some pods, promtail stops tailing after some entries, and never starts again. I can confirm other pods’ logs continue flowing and that promtail nor loki is restarting. When I look in the promtail logs, I see it stopping logging for the file in question:

level=info ts=2020-11-21T19:00:21.142362789Z caller=tailer.go:204 component=tailer msg="stopped tailing file" path=/var/log/pods/gadget-app-sandbox-production_poolmgr-js-sandbox-gadget-app-sandbox-production-78370217-2sxcl_136bd69d-9356-4371-a85f-a07ee92c62c1/js-sandbox/0.log

That pod is still running and still writing log entries to that file. If I run kubectl logs, I get a lot more than what promtail found.

Here’s all the log entries concerning the file in question:

2020-11-21 13:51:06	
level=info ts=2020-11-21T18:51:06.244785469Z caller=tailer.go:122 component=tailer msg="tail routine: started" path=/var/log/pods/gadget-app-sandbox-production_poolmgr-js-sandbox-gadget-app-sandbox-production-78370217-2sxcl_136bd69d-9356-4371-a85f-a07ee92c62c1/js-sandbox/0.log
Show context
2020-11-21 13:51:06	
ts=2020-11-21T18:51:06.245186214Z caller=log.go:124 component=tailer level=info msg="Seeked /var/log/pods/gadget-app-sandbox-production_poolmgr-js-sandbox-gadget-app-sandbox-production-78370217-2sxcl_136bd69d-9356-4371-a85f-a07ee92c62c1/js-sandbox/0.log - &{Offset:252 Whence:0}"
2020-11-21 14:00:21	
level=info ts=2020-11-21T19:00:21.142041929Z caller=tailer.go:95 component=tailer msg="position timer: exited" path=/var/log/pods/gadget-app-sandbox-production_poolmgr-js-sandbox-gadget-app-sandbox-production-78370217-2sxcl_136bd69d-9356-4371-a85f-a07ee92c62c1/js-sandbox/0.log
2020-11-21 14:00:21	
level=info ts=2020-11-21T19:00:21.142123717Z caller=tailer.go:138 component=tailer msg="tail routine: tail channel closed, stopping tailer" path=/var/log/pods/gadget-app-sandbox-production_poolmgr-js-sandbox-gadget-app-sandbox-production-78370217-2sxcl_136bd69d-9356-4371-a85f-a07ee92c62c1/js-sandbox/0.log
2020-11-21 14:00:21	
level=info ts=2020-11-21T19:00:21.142326841Z caller=tailer.go:131 component=tailer msg="tail routine: exited" path=/var/log/pods/gadget-app-sandbox-production_poolmgr-js-sandbox-gadget-app-sandbox-production-78370217-2sxcl_136bd69d-9356-4371-a85f-a07ee92c62c1/js-sandbox/0.log
2020-11-21 14:00:21	
level=info ts=2020-11-21T19:00:21.142362789Z caller=tailer.go:204 component=tailer msg="stopped tailing file" path=/var/log/pods/gadget-app-sandbox-production_poolmgr-js-sandbox-gadget-app-sandbox-production-78370217-2sxcl_136bd69d-9356-4371-a85f-a07ee92c62c1/js-sandbox/0.log

It seems like that timer component is maybe expecting something to come around and start tailing the same file again, and that never happens? I am not sure.

Notably, the pods in question are short lived (no less than 2 minutes but usually around 5), and inside the pod there are child processes opening and re-opening stdout and stderr, which is maybe what’s causing this. kubectl logs picks up logs fine from the child processes and so the logs are definitely on disk, it’s just promtail not picking them up.

Thanks for any help you can give me!

This is peculiar, what version of Promtail are you running?

2.0.0 ! Opened an issue on github here https://github.com/grafana/loki/issues/2979