Problems with NetApp ONTAP S3 Storage

We are trying to configure our simple scalable Loki deployment to use NetApp ONTAP S3-compatible storage but are running into issues with the read path.

The configuration we’ve tried is as follows:

# aws:
#   bucketnames: irs-loki-test
#   endpoint: ict-mc2-oss02.server.ufl.edu
#   region: default
#   access_key_id: OMITTED
#   secret_access_key: OMITTED
#   insecure: false
#   s3forcepathstyle: true
#   http_config:
#     insecure_skip_verify: true

I’ve enabled debug logging on the Loki write nodes and they look fine, chunks are being processed and flushed, and we can see files in the bucket like so, using rclone to check the system:

$ rclone ls mc2-loki-test:irs-loki-test/irs --max-age 5m
5308 1bce3a591abcb16c/19832b03908:198331e1624:3645d254
26817 1cb46683eec122aa/19832b30cfe:19833210506:b5cd8f29
558 1d04d62111b0309b/19832eabcd3:19832eabcd4:7ffab4cb
2497 4039560a5a438f27/19832b26d87:1983320bb70:1c11c777
5234 64ba633177e729c9/19832b1954a:198332046e7:aa773d77
2532 838fb526047f0d7e/19832b0c8de:198331ff10e:badf9c30
2178 afc11eb03b6c2f77/19832b18f25:19833203108:9e0c4e77
2181 c63326ad860d739b/19832b18f58:198332030d1:80fcd981
2159 c63326ad860d739b/19832b18f58:198332030d1:c2c19fc0
1460 d55d5de328835fc3/19832b32efa:1983321ac3e:4bcceaa8
2126 d712ce47832ddcdc/19832b37cac:19833221e84:98fdc5c4
2142 f88a71f38300e72b/19832b37cac:19833221e84:ab7a3a31
554 fb00fae29cdcdc15/19832eabda0:19832eabda1:f289b426

However on the read-side, when we try to query Loki we see the following errors:

Jul 22 12:13:24 az1-irs-o11y-test-loki-read-01 loki[1825728]: ts=2025-07-22T16:13:24.3448604Z caller=spanlogger.go:111 user=irs caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘irs/c86f9a16511d94fd/19832748c34:19832e2cb24:f4889002’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:24 az1-irs-o11y-test-loki-read-01 loki[1825728]: level=error ts=2025-07-22T16:13:24.369684258Z caller=parallel_chunk_fetch.go:71 msg=“error fetching chunks” err="failed to load chunk ‘irs/bdcdf1a4d38649fa/19832712714:19832df178c:23ab6c02’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:24 az1-irs-o11y-test-loki-read-01 loki[1825728]: ts=2025-07-22T16:13:24.369734882Z caller=spanlogger.go:111 user=irs caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘irs/bdcdf1a4d38649fa/19832712714:19832df178c:23ab6c02’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:28 az1-irs-o11y-test-loki-read-01 loki[1825728]: level=error ts=2025-07-22T16:13:28.333512967Z caller=parallel_chunk_fetch.go:71 msg=“error fetching chunks” err="failed to load chunk ‘irs/bfa0a085aa73f3ef/198326a9069:19832d86deb:daaa3f92’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:28 az1-irs-o11y-test-loki-read-01 loki[1825728]: ts=2025-07-22T16:13:28.333563552Z caller=spanlogger.go:111 user=irs caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘irs/bfa0a085aa73f3ef/198326a9069:19832d86deb:daaa3f92’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:36 az1-irs-o11y-test-loki-read-01 loki[1825728]: level=error ts=2025-07-22T16:13:36.461777332Z caller=parallel_chunk_fetch.go:71 msg=“error fetching chunks” err="failed to load chunk ‘irs/8443d47f5592e6fd/198326a8e08:19832d954ec:9dcd511b’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:36 az1-irs-o11y-test-loki-read-01 loki[1825728]: ts=2025-07-22T16:13:36.461823862Z caller=spanlogger.go:111 user=irs caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘irs/8443d47f5592e6fd/198326a8e08:19832d954ec:9dcd511b’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:39 az1-irs-o11y-test-loki-read-01 loki[1825728]: level=error ts=2025-07-22T16:13:38.958364586Z caller=parallel_chunk_fetch.go:71 msg=“error fetching chunks” err="failed to load chunk ‘irs/c86f9a16511d94fd/19832748c34:19832e2cb24:f4889002’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 22 12:13:39 az1-irs-o11y-test-loki-read-01 loki[1825728]: ts=2025-07-22T16:13:38.958421886Z caller=spanlogger.go:111 user=irs caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘irs/c86f9a16511d94fd/19832748c34:19832e2cb24:f4889002’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "

Loki actually continues to work, reading from the cache, but it only stores around 3 hours or so of data at a given time.

Any assistance with this would be greatly appreciated. I’ve also opened a Github issue here: Loki S3 Compatibility with NetApp ONTAP · Issue #18398 · grafana/loki · GitHub

thanks!

Can you share your Loki configuration, please?

Sure, thanks for your help Tony.

Ansible managed

3.4 / 3.4.2

target: read

auth_enabled: true

bloom_gateway:
client:
addresses: test.bloom.loki.it.ufl.edu:9443
enabled: true
common:
compactor_address: http://localhost:9443
replication_factor: 3
ring:
heartbeat_timeout: 10m
kvstore:
store: memberlist
compactor:
compaction_interval: 1m
delete_request_store: s3
retention_enabled: true
working_directory: /loki/compactor
frontend:
compress_responses: true
log_queries_longer_than: 15s
frontend_worker:
frontend_address: test.frontend.loki.it.ufl.edu:9443
grpc_client_config:
max_send_msg_size: 104857600.0
ingester:
chunk_idle_period: 1h
flush_check_period: 10s
max_chunk_age: 2h
wal:
replay_memory_ceiling: 5925MB
limits_config:
allow_structured_metadata: false
bloom_gateway_enable_filtering: true
ingestion_burst_size_mb: 60
ingestion_rate_mb: 40
max_cache_freshness_per_query: 10m
max_entries_limit_per_query: 100000
max_global_streams_per_user: 20000
max_query_parallelism: 6
max_query_series: 10000
per_stream_rate_limit: 40MB
per_stream_rate_limit_burst: 60MB
query_timeout: 3m
reject_old_samples: true
retention_period: 1w
split_queries_by_interval: 15m
memberlist:
abort_if_cluster_join_fails: false
bind_port: 7946
join_members:

  • az1-irs-o11y-test-loki-write-01.server.ufl.edu
  • az1-irs-o11y-test-loki-write-02.server.ufl.edu
  • az1-irs-o11y-test-loki-write-03.server.ufl.edu
  • az1-irs-o11y-test-loki-write-04.server.ufl.edu
  • az2-irs-o11y-test-loki-write-01.server.ufl.edu
  • az2-irs-o11y-test-loki-write-02.server.ufl.edu
  • az2-irs-o11y-test-loki-write-03.server.ufl.edu
  • az1-irs-o11y-test-loki-read-01.server.ufl.edu
  • az1-irs-o11y-test-loki-read-02.server.ufl.edu
  • az1-irs-o11y-test-loki-read-03.server.ufl.edu
  • az2-irs-o11y-test-loki-read-01.server.ufl.edu
  • az2-irs-o11y-test-loki-read-02.server.ufl.edu
    max_join_backoff: 1m
    max_join_retries: 10
    min_join_backoff: 1s
    rejoin_interval: 1m
    querier:
    max_concurrent: 6000
    multi_tenant_queries_enabled: true
    query_range:
    align_queries_with_step: true
    cache_results: true
    max_retries: 5
    results_cache:
    cache:
    embedded_cache:
    enabled: true
    max_size_mb: 2048
    ttl: 1h
    query_scheduler:
    max_outstanding_requests_per_tenant: 42768
    schema_config:
    configs:
  • from: ‘2023-02-15’
    index:
    period: 24h
    prefix: index_tsdb_
    object_store: s3
    schema: v13
    store: tsdb
    server:
    grpc_listen_port: 9443
    grpc_server_max_concurrent_streams: 1500
    grpc_server_max_recv_msg_size: 104857600
    grpc_server_max_send_msg_size: 104857600
    http_listen_port: 8443
    http_server_read_timeout: 610s
    http_server_write_timeout: 610s
    http_tls_config:
    cert_file: /etc/loki/ssl/cert.crt
    key_file: /etc/loki/ssl/cert.key
    log_level: info
    storage_config:
    aws:
    access_key_id: XXXXX
    bucketnames: irs-loki-test
    endpoint: ict-mc2-oss02.server.ufl.edu
    http_config:
    insecure_skip_verify: true
    insecure: false
    region: default
    s3forcepathstyle: true
    secret_access_key: XXXXX
    hedging:
    at: 250ms
    max_per_second: 20
    up_to: 3
    tsdb_shipper:
    active_index_directory: /loki/tsdb-shipper-active
    cache_location: /loki/tsdb-shipper-cache

Try changing this under your schema_config:

# object_store: s3
object_store: aws

I suspect your chunk files aren’t actually being sent to your storage.

I’ve updated the config:

schema_config:
configs:
- from: “2025-07-01”
store: tsdb
#object_store: s3
object_store: aws
schema: v13
index:
prefix: index_tsdb_
period: 24h

When running queries I am still seeing this in the logs:

Jul 23 12:25:03 az1-irs-o11y-test-loki-read-01 loki[1984633]: ts=2025-07-23T16:25:03.672364581Z caller=spanlogger.go:111 user=observability caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘observability/1e1b9725cc318b1/198376e2182:19838036128:d1967b4f’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 23 12:25:03 az1-irs-o11y-test-loki-read-01 loki[1984633]: level=error ts=2025-07-23T16:25:03.859186713Z caller=parallel_chunk_fetch.go:71 msg=“error fetching chunks” err="failed to load chunk ‘observability/1e1b9725cc318b1/198376e2182:19838036128:d1967b4f’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "
Jul 23 12:25:03 az1-irs-o11y-test-loki-read-01 loki[1984633]: ts=2025-07-23T16:25:03.859249493Z caller=spanlogger.go:111 user=observability caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘observability/1e1b9725cc318b1/198376e2182:19838036128:d1967b4f’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "

Changing the object store doesn’t magically fix the chunks that are in the wrong place. But you should at least be able to see the newer logs.

If you want, you can try to locate the chunks that are erroring out (most likely on your filesystem), and manually migrate them.

Well I spoke too soon and it certainly did fix the issue. Thanks Tony!

Spoke too soon. While I do see logs being written and I can read back several days I’m still seeing these errors:

Jul 24 16:00:36 az1-irs-o11y-test-loki-read-01 loki[1984633]: ts=2025-07-24T20:00:35.910027618Z caller=spanlogger.go:111 user=irs caller=log.go:168 level=error msg=“failed downloading chunks” err="failed to load chunk ‘irs/f1e432da6d777e4/198379baa4c:198380ad91f:7c3b8c6e’: failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: , host id: "