Loki suddenly exit: SIGSEGV: segmentation violation

A node in Loki’s memberlist cluster suddenly exited without any abnormalities in memory and CPU, while the other three nodes were running normally without exiting.
The log error which exiting node is as follows:

level=error ts=2025-03-20T05:58:54.772558216Z caller=scheduler_processor.go:111 component=querier msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=<IP>:3110
level=error ts=2025-03-20T05:58:54.772963141Z caller=scheduler_processor.go:175 component=querier org_id=fake traceID=7001c75f6ae01fc2 msg="error notifying scheduler about finished query" err=EOF addr=<IP>:3110
SIGSEGV: segmentation violation
PC=0x419577 m=14 sigcode=1 addr=0x20

goroutine 0 gp=0xc01974c8c0 m=14 mp=0xc0225ab008 [idle]:
runtime.(*mspan).typePointersOfUnchecked(0xc06379a000?, 0xc02b847230?)
        /usr/local/go/src/runtime/mbitmap_allocheaders.go:202 +0x37 fp=0xc026d4de80 sp=0xc026d4de60 pc=0x419577
runtime.scanobject(0xc000095c68?, 0xc000095c68)
        /usr/local/go/src/runtime/mgcmark.go:1446 +0xb5 fp=0xc026d4df10 sp=0xc026d4de80 pc=0x425455
runtime.gcDrain(0xc000095c68, 0x3)
        /usr/local/go/src/runtime/mgcmark.go:1242 +0x1f4 fp=0xc026d4df78 sp=0xc026d4df10 pc=0x424db4
runtime.gcDrainMarkWorkerDedicated(...)
        /usr/local/go/src/runtime/mgcmark.go:1124
runtime.gcBgMarkWorker.func2()
        /usr/local/go/src/runtime/mgc.go:1387 +0xa5 fp=0xc026d4dfc8 sp=0xc026d4df78 pc=0x421425
runtime.systemstack(0x479126)
        /usr/local/go/src/runtime/asm_amd64.s:509 +0x4a fp=0xc026d4dfd8 sp=0xc026d4dfc8 pc=0x4772ea

goroutine 35 gp=0xc0001e6380 m=14 mp=0xc0225ab008 [GC worker (active)]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:474 +0x8 fp=0xc0000ad750 sp=0xc0000ad740 pc=0x477288
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1370 +0x1f2 fp=0xc0000ad7e0 sp=0xc0000ad750 pc=0x4210b2
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000ad7e8 sp=0xc0000ad7e0 pc=0x479121
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1234 +0x1c

my loki version is 3.2.0

how to solve the problem ?

Is that one container still not running?

Segmentation fault once in a while isn’t particularly concerning in my opinion. If you only have this problem with one container you might want to check your Loki metrics and make sure all nodes are load balanced properly.

Thank you, it worked fine after restarting. I will continue to observe

The same malfunction has occurred again:


ts=2025-07-07T09:19:31.065568767Z caller=spanlogger.go:111 user=fake caller=log.go:168 level=error msg="failed downloading c
hunks" err="context canceled"
level=error ts=2025-07-07T09:19:31.107343031Z caller=scheduler_processor.go:175 component=querier org_id=fake traceID=0a6ccf
cc21ed5239 msg="error notifying scheduler about finished query" err=EOF addr=172.16.108.110:3110
level=error ts=2025-07-07T09:19:31.155062897Z caller=scheduler_processor.go:175 component=querier org_id=fake traceID=0a6ccfcc21ed5239 msg="error notifying scheduler about finished query" err=EOF addr=172.16.108.110:3110
level=error ts=2025-07-07T09:19:31.360057535Z caller=errors.go:26 org_id=fake traceID=3c0d137a30bf31d0 message="closing iterator" error="context canceled"
level=error ts=2025-07-07T09:19:31.593494056Z caller=errors.go:26 org_id=fake traceID=3c0d137a30bf31d0 message="closing iterator" error="context canceled"
level=error ts=2025-07-07T09:19:32.307800525Z caller=errors.go:26 org_id=fake traceID=0a6ccfcc21ed5239 message="closing iterator" error="context canceled"
level=error ts=2025-07-07T09:19:33.089224635Z caller=errors.go:26 org_id=fake traceID=3c0d137a30bf31d0 message="closing iterator" error="context canceled"
2025-07-07 17:19:43.256083 I | http: TLS handshake error from 172.26.198.52:37776: EOF
2025-07-07 17:19:58.255736 I | http: TLS handshake error from 172.26.198.52:38478: EOF
SIGSEGV: segmentation violation
PC=0x419577 m=3 sigcode=1 addr=0x20

goroutine 0 gp=0xc000007180 m=3 mp=0xc0000b5008 [idle]:
runtime.(*mspan).typePointersOfUnchecked(0xc016ea8000?, 0xc038dc6240?)
        /usr/local/go/src/runtime/mbitmap_allocheaders.go:202 +0x37 fp=0xc0000c9e80 sp=0xc0000c9e60 pc=0x419577
runtime.scanobject(0xc00008f268?, 0xc00008f268)
        /usr/local/go/src/runtime/mgcmark.go:1446 +0xb5 fp=0xc0000c9f10 sp=0xc0000c9e80 pc=0x425455
runtime.gcDrain(0xc00008f268, 0x2)
        /usr/local/go/src/runtime/mgcmark.go:1242 +0x1f4 fp=0xc0000c9f78 sp=0xc0000c9f10 pc=0x424db4
runtime.gcDrainMarkWorkerDedicated(...)
        /usr/local/go/src/runtime/mgcmark.go:1124
runtime.gcBgMarkWorker.func2()
        /usr/local/go/src/runtime/mgc.go:1402 +0x155 fp=0xc0000c9fc8 sp=0xc0000c9f78 pc=0x4214d5
runtime.systemstack(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:509 +0x4a fp=0xc0000c9fd8 sp=0xc0000c9fc8 pc=0x4772ea

goroutine 33 gp=0xc000682000 m=3 mp=0xc0000b5008 [GC worker (active)]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:474 +0x8 fp=0xc000688750 sp=0xc000688740 pc=0x477288
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1370 +0x1f2 fp=0xc0006887e0 sp=0xc000688750 pc=0x4210b2
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006887e8 sp=0xc0006887e0 pc=0x479121
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 1 gp=0xc0000061c0 m=nil [select, 64917 minutes]:
runtime.gopark(0xc003938210?, 0x2?, 0xc0?, 0x61?, 0xc00393820c?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0053460b8 sp=0xc005346098 pc=0x440dce
runtime.selectgo(0xc005346210, 0xc003938208, 0x28c3ace?, 0x0, 0x42e7fa0?, 0x1)
        /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc0053461d8 sp=0xc0053460b8 pc=0x452c05
github.com/grafana/dskit/services.(*Manager).AwaitStopped(0xc0030d0120, {0x42e7fa0, 0x6355480})
        /src/loki/vendor/github.com/grafana/dskit/services/manager.go:153 +0x67 fp=0xc005346240 sp=0xc0053461d8 pc=0xad50a7
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc000e76008, {0x0?, {0x4?, 0x2?, 0x62ee420?}})
        /src/loki/pkg/loki/loki.go:592 +0xea5 fp=0xc0053463f8 sp=0xc005346240 pc=0x2d31985
main.main()
        /src/loki/cmd/loki/main.go:129 +0x1333 fp=0xc00534df50 sp=0xc0053463f8 pc=0x2d5e653
runtime.main()
        /usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc00534dfe0 sp=0xc00534df50 pc=0x44097d
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00534dfe8 sp=0xc00534dfe0 pc=0x479121

The ‘grep goroutine’ contains a large number of

goroutine 0 gp=0xc000007180 m=3 mp=0xc0000b5008 [idle]:
goroutine 33 gp=0xc000682000 m=3 mp=0xc0000b5008 [GC worker (active)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 1 gp=0xc0000061c0 m=nil [select, 64917 minutes]:
goroutine 2 gp=0xc000006700 m=nil [force gc (idle), 64917 minutes]:
created by runtime.init.6 in goroutine 1
goroutine 3 gp=0xc000006c40 m=nil [GC sweep wait]:
created by runtime.gcenable in goroutine 1
goroutine 4 gp=0xc000006e00 m=nil [sleep]:
created by runtime.gcenable in goroutine 1
goroutine 17 gp=0xc000102380 m=nil [finalizer wait]:
created by runtime.createfing in goroutine 1
goroutine 18 gp=0xc000102540 m=nil [GC worker (idle)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 5 gp=0xc000007340 m=nil [GC worker (idle)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 19 gp=0xc000102700 m=nil [GC worker (idle)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 6 gp=0xc000007500 m=nil [GC worker (idle)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 49 gp=0xc000702000 m=nil [GC worker (idle)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 50 gp=0xc0007021c0 m=nil [GC worker (idle)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 34 gp=0xc0006821c0 m=nil [GC worker (idle)]:
created by runtime.gcBgMarkStartWorkers in goroutine 1
goroutine 7 gp=0xc000802380 m=nil [select, 64917 minutes]:
created by github.com/baidubce/bce-sdk-go/util/log.NewLogger in goroutine 1
goroutine 20 gp=0xc000c01500 m=nil [select]:
created by go.opencensus.io/stats/view.init.0 in goroutine 1
goroutine 138 gp=0xc000c01180 m=nil [chan receive]:
created by github.com/grafana/loki/v3/pkg/util/log.newPrometheusLogger.WithFlushPeriod.func2 in goroutine 1
goroutine 171 gp=0xc000c461c0 m=nil [select, 1 minutes]:
created by github.com/uber/jaeger-client-go.NewRemotelyControlledSampler in goroutine 1
goroutine 173 gp=0xc000c46380 m=nil [select]:
created by github.com/uber/jaeger-client-go/utils.newReconnectingUDPConn in goroutine 1
goroutine 174 gp=0xc000c46540 m=nil [select]:
created by github.com/uber/jaeger-client-go.NewRemoteReporter in goroutine 1
goroutine 181 gp=0xc000c468c0 m=nil [select, 64917 minutes]:
created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer in goroutine 1
goroutine 182 gp=0xc000c46c40 m=nil [select, 64917 minutes]:
created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer in goroutine 1
goroutine 183 gp=0xc000c46e00 m=nil [select, 64917 minutes]:
created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer in goroutine 1
goroutine 812 gp=0xc000c01340 m=nil [select, 64917 minutes]:
created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer in goroutine 771

loki version:

loki, version 3.2.0 (branch: k218, revision: 659f5421)
  build user:       root@003ce357cdf4
  build date:       2024-09-18T16:21:52Z
  go version:       go1.22.6
  platform:         linux/amd64
  tags:             netgo

loki.yml:

auth_enabled: false

memberlist:
  bind_port: 7946
  join_members:
  - <IP1>:7946
  - <IP2>:7946
  - <IP3>:7946
  - <IP4>:7946

server:
  http_listen_port: 3100
  http_tls_config:
    cert_file: /export/server/loki/ssl/loki.crt
    key_file:  /export/server/loki/ssl/loki.key
    client_ca_file: /export/server/loki/ssl/ca.crt
    client_auth_type: RequireAndVerifyClientCert
  tls_min_version: VersionTLS12
  tls_cipher_suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  grpc_listen_port: 3110
  grpc_server_max_recv_msg_size: 109951162777600  # 100G
  grpc_server_max_send_msg_size: 109951162777600  # 100G
  log_level: error  # info debug  error

common:
  replication_factor: 1
  ring:
    replication_factor: 1
    heartbeat_period: 2s
    heartbeat_timeout: 5s
    kvstore:
      store: memberlist
  path_prefix: /export/loki-data

ingester:
  chunk_retain_period: 30m
  flush_check_period: 1m
  max_chunk_age: 2h
  chunk_idle_period: 1h
  chunk_target_size: 10485760
  sync_min_utilization: 1
  autoforget_unhealthy: true
  wal:
    enabled: true
  lifecycler:
    final_sleep: 10s
    min_ready_duration: 3s
    ring:
      replication_factor: 1

schema_config:
  configs:
  - from: 2020-05-15
    store: tsdb
    object_store: s3
    schema: v13
    index:
      prefix: loki_
      period: 24h
storage_config:
  tsdb_shipper:
    resync_interval: 1m
  aws:
    s3: s3://XXXXXX:XXXXXX@trust-jrss.d.vb.local/loki

querier:
  max_concurrent: 30

query_range:
  max_retries: 2

frontend_worker:
  query_frontend_grpc_client:
    max_recv_msg_size: 107374182400
    max_send_msg_size: 107374182400
  query_scheduler_grpc_client:
    max_recv_msg_size: 107374182400
    max_send_msg_size: 107374182400


compactor:
  compaction_interval: 60m
  retention_delete_delay: 10m
  retention_delete_worker_count: 150
  retention_enabled: true
  delete_request_store: aws

analytics:
  reporting_enabled: false

limits_config:
  max_global_streams_per_user: 200000
  max_streams_per_user: 200000
  ingestion_rate_strategy: global
  ingestion_rate_mb: 1024
  ingestion_burst_size_mb: 1024
  reject_old_samples: false
  max_query_length: 100d
  max_query_parallelism: 500
  max_entries_limit_per_query: 0
  max_concurrent_tail_requests: 200
  increment_duplicate_timestamp: true
  max_line_size: 10240KB
  tsdb_max_query_parallelism: 1280
  split_queries_by_interval:  24h
  query_timeout: 60m
  split_metadata_queries_by_interval: 10m
  per_stream_rate_limit: 1G
  per_stream_rate_limit_burst: 1G
  max_query_series: 5000
  shard_streams:
    enabled: false

These look kinda suspicious, do you know what these are?