Loki stuck Flushing with error: "failed to flush chunks: store put chunk: Request is too big"

I have setup loki with three different modes (read, write & table-manager) on same server. The write agent is configured with TLS. Promtail on my other servers pushes logs to Loki write agent via HTTPS.

Loki logs show that its currently stuck at following and doesn’t seem to get out of this loop:

level=error ts=2023-05-11T03:00:47.18493146Z caller=flush.go:144 org_id=fake msg="failed to flush" err="failed to flush chunks: store put chunk: Request is too big: length 51934398 exceeds maximum allowed length 16777216., num_chunks: 21, labels: {filename=\"/S/1.0.0.0/log/AF/4479_file_to_bl.log\", host=\"server1\"}"
level=error ts=2023-05-11T03:00:47.187445037Z caller=flush.go:144 org_id=fake msg="failed to flush" err="failed to flush chunks: store put chunk: Request is too big: length 25966489 exceeds maximum allowed length 16777216., num_chunks: 25, labels: {filename=\"/S/1.0.0.0/log/AF/4485_file_to_bl.log\", host=\"server1\"}"

The log 4485_file_to_bl.log & 4479_file_to_bl.log on server1 are 3.5GB each. I tried removing these log files from the server1, the loki agent is still stuck and fills the log rapidly with above error.

Unsure what limit is Loki hitting, I was unable to find where is the number "16777216" in the error "exceeds maximum allowed length 16777216" coming from.

Also, while above is happening the other promtail agents are unable to push data to loki as its stuck and busy. Following is seen on the server where promtail is installed:

level=warn ts=2023-05-11T03:04:39.945485821Z caller=client.go:379 component=client host=loki-server:8086 msg="error sending batch, will retry" status=500 error="server returned HTTP status 500 Internal Server Error (500): empty ring"

Loki Write Configuration:

./loki -version
loki, version 2.8.2 (branch: HEAD, revision: 9f809eda7)
build user: root@b7e9ca0bf6e0
build date: 2023-05-03T11:13:57Z
go version: go1.20.4
platform: linux/amd64

---

auth_enabled: false

server:
#  http_listen_address: 127.0.0.1
  http_listen_port: 8086
  grpc_listen_port: 9443
  http_tls_config:
    cert_file: /sites/loganalytics/loki/certs/loki.server.crt
    key_file: /sites/loganalytics/loki/certs/server.key
    client_auth_type: RequireAndVerifyClientCert
    client_ca_file: /sites/loganalytics/loki/certs/ca.crt
  grpc_server_max_recv_msg_size: 1048576000 #1000MB Default is 4MB
  grpc_server_max_send_msg_size: 1048576000 #1000MB Default is 4MB

common:
  path_prefix: /sites/loganalytics/loki
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist

ingester:
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  lifecycler:
    final_sleep: 0s
  wal:
   enabled: true
   dir: /sites/loganalytics/loki/data/wal

   #ingester_client:
   #grpc_client_config:
   # max_send_msg_size: 1048576000
   # max_recv_msg_size: 1048576000


############################CASSANDRA STORAGE CONFIGURATION################
schema_config:
  configs:
    - from: 2023-01-01
      store: cassandra
      object_store: cassandra
      schema: v11
      index:
        prefix: cassandra_index_
        period: 168h
      chunks:
        prefix: cassandra_chunk_
        period: 168h

storage_config:
  cassandra:
    addresses: 127.0.0.1
    auth: true #default = false
    timeout: 600s #default
    connect_timeout: 600s #default
    keyspace: loki
    username: xxxxxxx
    password: xxxxxxx
    consistency: LOCAL_ONE

##########################################################################

memberlist:
  bind_port: 7946
  join_members:
    - 127.0.0.1:7947
    - 127.0.0.1:7948

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 30m
  max_entries_limit_per_query: 5000
  max_streams_per_user: 100000
  max_chunks_per_query: 200000
  ingestion_rate_mb: 1000
  ingestion_burst_size_mb: 1500
  max_query_parallelism: 32
  per_stream_rate_limit: "512M" #default
  per_stream_rate_limit_burst: "1024M" #default

Promtail Agent configuration:

promtail, version HEAD-e0af1cc (branch: HEAD, revision: e0af1cc8a)
build user: root@5004faa13e2e
build date: 2022-12-09T19:23:40Z
go version: go1.19.2
platform: linux/amd64

---

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /opt/loganalytics/promtail/conf/positions.yaml

clients:
  - url: 'https://loki-server:8086/loki/api/v1/push'
    tls_config:
      ca_file: /opt/loganalytics/promtail/certs/ca.crt
      cert_file: /opt/loganalytics/promtail/certs/promtail.client.crt
      key_file: /opt/loganalytics/promtail/certs/client.key
      server_name: lokiserver.com
      insecure_skip_verify: false

scrape_configs:
  - job_name: publogs
    static_configs:
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /C/CR/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /I/IS/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /F/FO/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /F/FD/2.0.0.1/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /H/HY/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /T/TO/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /K/KA/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /S/SU/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /S/SZ/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /N/NI/1.0.0.0/log/**/*.log
    - targets:
       - localhost
      labels:
       host: server1
       __path__: /U/UD/1.0.0.0/log/**/*.log

I’ve not used Cassandra with Loki before, but looks like you might be hitting some sort of limit with Cassandra. I’d recommend you to try and store chunks in an object storage like S3, if not even local file system might be easier.

Does this mean Loki doesn’t support Cassandra for storage?

I have made some recommended tuning to cassandra, its now stable and running without any errors.

However loki is now complaining with following error:

level=error ts=2023-05-18T12:39:51.01130131Z caller=connectionpool.go:631 module=gocql client=chunks-write msg=hostConnPool.HandleError err="frame length is bigger than the maximum allowed" closed=true
level=error ts=2023-05-18T12:39:51.011398195Z caller=flush.go:144 org_id=fake msg="failed to flush" err="failed to flush chunks: store put chunk: frame length is bigger than the maximum allowed, num_chunks: 35, labels: {filename="5792_P.log\", host=\"server1\"}"

This seems to be a problem with Loki.