Hello!
Faced a problem when installing a loki cluster.
The cluster consists of 5 nodes:
- Nginx proxy
- loki reader x2
- loki writer x2
All have the same configuration file installed with a difference in the target item:
auth_enabled: false
target: "read"
memberlist:
node_name: "loki-reader2.example.com"
join_members:
- "loki-reader1.example.com"
- "loki-reader2.example.com"
- "loki-writer1.example.com"
- "loki-writer2.example.com"
dead_node_reclaim_time: "30s"
gossip_to_dead_nodes_time: "15s"
left_ingesters_timeout: "30s"
bind_addr:
- "0.0.0.0"
bind_port: 7946
server:
grpc_listen_port: 9095
http_listen_port: 3100
http_server_read_timeout: "1h"
http_server_write_timeout: "5m"
http_server_idle_timeout: "5m"
grpc_server_max_recv_msg_size: 41943040
grpc_server_max_send_msg_size: 41943040
log_level: debug
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: memberlist
querier:
max_concurrent: 10
query_ingesters_within: 5h
query_timeout: 1h
engine:
timeout: 5m
schema_config:
configs:
- from: 2022-10-01
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
compactor:
retention_enabled: true
retention_delete_delay: "1h"
delete_request_cancel_period: "1h"
working_directory: /loki/compactor
shared_store: s3
retention_delete_worker_count: 10
compactor_ring:
kvstore:
store: memberlist
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index-cache
shared_store: s3
aws:
bucketnames: loki
endpoint: https://s3.example.com
access_key_id: -----------------
secret_access_key: ----------------------
insecure: true
http_config:
idle_conn_timeout: 5m
response_header_timeout: 0s
insecure_skip_verify: true
s3forcepathstyle: true
limits_config:
retention_period: "4320h"
per_stream_rate_limit: "500MB"
per_stream_rate_limit_burst: "1000MB"
max_concurrent_tail_requests: 10000
ingestion_rate_mb: 100
ingestion_burst_size_mb: 100
max_entries_limit_per_query: 100000
ingester:
lifecycler:
join_after: "60s"
observe_period: "5s"
ring:
replication_factor: 1
kvstore:
store: memberlist
final_sleep: "0s"
max_chunk_age: "240h"
All healthchecks pass, the service starts.
According to the logs, Loki was able to locate all the members of the ring.
But when trying to connect the cluster to grafana, a 500 error:
Loki: Internal Server Error. 500. empty ring
In the service logs I see the following:
ts=2022-11-07T13:12:13.926196155Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.927323183Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=35.563µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.927996495Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=35.563µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.931994093Z caller=grpc_logging.go:46 method=/frontendv2pb.FrontendForQuerier/QueryResult duration=105.52µs msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.932669943Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.934128211Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.934462835Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=13.737µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.935224122Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=13.737µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.936167945Z caller=grpc_logging.go:46 duration=42.182µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.936833055Z caller=retry.go:73 org_id=fake msg="error processing request" try=1 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.938052577Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.938284912Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=45.012µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.938793954Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=45.012µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.939859299Z caller=grpc_logging.go:46 duration=44.651µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.940325981Z caller=retry.go:73 org_id=fake msg="error processing request" try=2 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.941421855Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.941910948Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=7.473µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.942403329Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=7.473µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.943278467Z caller=grpc_logging.go:46 method=/frontendv2pb.FrontendForQuerier/QueryResult duration=28.428µs msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.943797077Z caller=retry.go:73 org_id=fake msg="error processing request" try=3 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.944847003Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.945103343Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=8.999µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.945650128Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=8.999µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.946855527Z caller=grpc_logging.go:46 duration=50.382µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.947119279Z caller=retry.go:73 org_id=fake msg="error processing request" try=4 err="rpc error: code = Code(500) desc = empty ring\n"
level=error ts=2022-11-07T13:12:13.947827961Z caller=stats.go:57 component=frontend msg="failed to record query metrics" err="expected one of the *LokiRequest, *LokiInstantRequest, *LokiSeriesRequest, *LokiLabelNamesRequest, got "
At the same time, the installation of loki on one server works correctly, logs are written and read.
Tell me, please, where to dig, I can not find any problems.
Also, I tried to connect the distributor to the loki, the logging does not happen either. A 404 error is thrown without any explanation:
level=debug ts=2022-11-07T11:35:11.566374631Z caller=logging.go:76 traceID=7dd739ced5088cbe orgID=fake msg="POST /loki/api/v1/push (404) 87.265µs"
level=debug ts=2022-11-07T11:35:11.653902837Z caller=logging.go:76 traceID=766c347a31371e33 orgID=fake msg="POST /loki/api/v1/push (404) 76.161µs"
level=debug ts=2022-11-07T11:35:11.906294651Z caller=logging.go:76 traceID=58fe0b96a10b3d6b orgID=fake msg="POST /loki/api/v1/push (404) 177.982µs"
level=debug ts=2022-11-07T11:35:11.922494924Z caller=logging.go:76 traceID=0f26b052f9416264 orgID=fake msg="POST /loki/api/v1/push (404) 88.356µs"
At the same time, the s3 storage is available, another lock is written to it in parallel, which is located on a separate server.