Hi!
Using loki-distributed helm-chart 0.78.0 and promtail 6.15.3
Every-time i take down loki and up again Promtail cannot reconnect.
I have to restart promtail again, then it works fine again.
Output from promtail:
First i get 404 when loki closes, then when it’s up again i get 500.
`level=warn ts=2024-01-12T08:52:52.776069987Z caller=client.go:419 component=client host=` `loki.test.com` `msg="error sending batch, will retry" status=500
tenant= error="server returned HTTP status 500 Internal Server Error
(500): empty ring"`
Similar problems:
opened 07:27AM - 23 Aug 21 UTC
closed 08:42AM - 23 Aug 21 UTC
`promtail_1 | level=warn ts=2021-08-23T07:22:20.680240681Z caller=client.go:344… component=client host=10.10.12.189:18080 msg="error sending batch, will retry" status=500 error="server returned HTTP status 500 Internal Server Error (500): empty ring"
`
Why I receive this error?
After restart Promtail connection resumes
opened 01:33PM - 07 Nov 22 UTC
Hello!
Faced a problem when installing a loki cluster.
The cluster consists … of 5 nodes:
1) Nginx proxy
2) loki reader x2
3) loki writer x2
All have the same configuration file installed with a difference in the target item:
```
auth_enabled: false
target: "read"
memberlist:
node_name: "loki-reader2.example.com"
join_members:
- "loki-reader1.example.com"
- "loki-reader2.example.com"
- "loki-writer1.example.com"
- "loki-writer2.example.com"
dead_node_reclaim_time: "30s"
gossip_to_dead_nodes_time: "15s"
left_ingesters_timeout: "30s"
bind_addr:
- "0.0.0.0"
bind_port: 7946
server:
grpc_listen_port: 9095
http_listen_port: 3100
http_server_read_timeout: "1h"
http_server_write_timeout: "5m"
http_server_idle_timeout: "5m"
grpc_server_max_recv_msg_size: 41943040
grpc_server_max_send_msg_size: 41943040
log_level: debug
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: memberlist
querier:
max_concurrent: 10
query_ingesters_within: 5h
query_timeout: 1h
engine:
timeout: 5m
schema_config:
configs:
- from: 2022-10-01
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
compactor:
retention_enabled: true
retention_delete_delay: "1h"
delete_request_cancel_period: "1h"
working_directory: /loki/compactor
shared_store: s3
retention_delete_worker_count: 10
compactor_ring:
kvstore:
store: memberlist
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index-cache
shared_store: s3
aws:
bucketnames: loki
endpoint: https://s3.example.com
access_key_id: -----------------
secret_access_key: ----------------------
insecure: true
http_config:
idle_conn_timeout: 5m
response_header_timeout: 0s
insecure_skip_verify: true
s3forcepathstyle: true
limits_config:
retention_period: "4320h"
per_stream_rate_limit: "500MB"
per_stream_rate_limit_burst: "1000MB"
max_concurrent_tail_requests: 10000
ingestion_rate_mb: 100
ingestion_burst_size_mb: 100
max_entries_limit_per_query: 100000
ingester:
lifecycler:
join_after: "60s"
observe_period: "5s"
ring:
replication_factor: 1
kvstore:
store: memberlist
final_sleep: "0s"
max_chunk_age: "240h"
```
All healthchecks pass, the service starts.
According to the logs, Loki was able to locate all the members of the ring.
But when trying to connect the cluster to grafana, a 500 error:
`Loki: Internal Server Error. 500. empty ring`
In the service logs I see the following:
```
ts=2022-11-07T13:12:13.926196155Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.927323183Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=35.563µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.927996495Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=35.563µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.931994093Z caller=grpc_logging.go:46 method=/frontendv2pb.FrontendForQuerier/QueryResult duration=105.52µs msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.932669943Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.934128211Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.934462835Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=13.737µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.935224122Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=13.737µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.936167945Z caller=grpc_logging.go:46 duration=42.182µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.936833055Z caller=retry.go:73 org_id=fake msg="error processing request" try=1 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.938052577Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.938284912Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=45.012µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.938793954Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=45.012µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.939859299Z caller=grpc_logging.go:46 duration=44.651µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.940325981Z caller=retry.go:73 org_id=fake msg="error processing request" try=2 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.941421855Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.941910948Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=7.473µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.942403329Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=7.473µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.943278467Z caller=grpc_logging.go:46 method=/frontendv2pb.FrontendForQuerier/QueryResult duration=28.428µs msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.943797077Z caller=retry.go:73 org_id=fake msg="error processing request" try=3 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.944847003Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.945103343Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=8.999µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.945650128Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=8.999µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.946855527Z caller=grpc_logging.go:46 duration=50.382µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.947119279Z caller=retry.go:73 org_id=fake msg="error processing request" try=4 err="rpc error: code = Code(500) desc = empty ring\n"
level=error ts=2022-11-07T13:12:13.947827961Z caller=stats.go:57 component=frontend msg="failed to record query metrics" err="expected one of the *LokiRequest, *LokiInstantRequest, *LokiSeriesRequest, *LokiLabelNamesRequest, got "
```
At the same time, the installation of loki on one server works correctly, logs are written and read.
Tell me, please, where to dig, I can not find any problems.
Also, I tried to connect the distributor to the loki, the logging does not happen either. A 404 error is thrown without any explanation:
```
level=debug ts=2022-11-07T11:35:11.566374631Z caller=logging.go:76 traceID=7dd739ced5088cbe orgID=fake msg="POST /loki/api/v1/push (404) 87.265µs"
level=debug ts=2022-11-07T11:35:11.653902837Z caller=logging.go:76 traceID=766c347a31371e33 orgID=fake msg="POST /loki/api/v1/push (404) 76.161µs"
level=debug ts=2022-11-07T11:35:11.906294651Z caller=logging.go:76 traceID=58fe0b96a10b3d6b orgID=fake msg="POST /loki/api/v1/push (404) 177.982µs"
level=debug ts=2022-11-07T11:35:11.922494924Z caller=logging.go:76 traceID=0f26b052f9416264 orgID=fake msg="POST /loki/api/v1/push (404) 88.356µs"
```
At the same time, the s3 storage is available, another lock is written to it in parallel, which is located on a separate server.
I have also tried to add this i promtail:
config:
clients:
backoff_config:
min_period: 500ms
max_period: 20m
max_retries: 100
Also i can get this error in promtail:
HTTP status 404 page not found
then after loki comes up:
sending batch, will retry status=502 tenant= error"server returned HTTP status 502 Bad Gateway.
It is using the same dns name to push logs, but somehow promtail cannot reconnect. What is different with loki when it goes down and up again, “reinstalled”?
It seems that it works as it should, even though i get those errors. I still can ingest logs.