Background: we have a monitoring service that uses a loki tail to fetch logs from Loki via a Websocket. In case the connection closes, the service is able to recover and retry to establish the connection.
sporadically this reconnect does not work, at times for several hours. From the logs of monitoring service it appears as if the connection was closed remotely (by Loki), given the status code 1006.
please let me know someone faced same issue and suggest any recommendations:
2025-01-14 16:12:27.405 level=info ts=2025-01-14T15:12:27.335865752Z caller=http.go:275 org_id=fake msg="starting to tail logs" tenant=fake selectors="{cluster="XXXXXXXX"} |= `| json contextMap, endOfBatch, level, loggerFqcn, loggerName, message, thread, threadId, threadPriority, thrown, timeMillis | line_format`"
2025-01-14 16:12:27.405 level=error ts=2025-01-14T15:12:27.342099072Z caller=http.go:290 org_id=fake msg="Error connecting to ingesters for tailing" err="websocket: invalid control frame"`
2025-01-14 16:12:27.405 level=info ts=2025-01-14T15:12:27.342268329Z caller=http.go:278 org_id=fake msg="ended tailing logs" tenant=fake selectors="{cluster="XXXXXXXX"} |= `| json contextMap, endOfBatch, level, loggerFqcn, loggerName, message, thread, threadId, threadPriority, thrown, timeMillis | line_format`"