Memcached Config in K8S Distributed model

mdiorio · August 17, 2021, 5:04pm

Wondering if someone can sanity check my config and possibly shed some light on I/o timeouts to memcached.

I’m using the helm distributed deployment in K8S (1 compactor, 1 distributor, 3 ingestors, 2 gateways, 3 queriers, 3 frontend, 1 table manager, 1 memcached chunk, 1 memcached index queries, 1 memcached frontend).

When I look at the pod logs, I only see the querier pods referencing memcached and that’s the one giving I/o timeouts in the logs, but the IP it’s using is the chunk memcached instance. Is that right?

storage_config:
  aws:
    s3: s3://lokibucket.s3.us-east-1
    bucketnames: lokibucket

  boltdb_shipper:
    shared_store: s3
    active_index_directory: /var/loki/index
    cache_location: /var/loki/cache
    cache_ttl: 168h

  index_queries_cache_config:
    memcached:
      batch_size: 100
      parallelism: 100
    memcached_client:
      host: loki-loki-distributed-memcached-index-queries.loki.svc.cluster.local
      service: http

chunk_store_config:
  max_look_back_period: 0s
  chunk_cache_config:
    memcached:
      batch_size: 100
      parallelism: 100
    memcached_client:
      host: loki-loki-distributed-memcached-chunks.loki.svc.cluster.local
      service: http
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

query_range:
  align_queries_with_step: true
  max_retries: 5
  split_queries_by_interval: 15m
  cache_results: true
  results_cache:
    cache:
      memcached:
        batch_size: 100
        parallelism: 100
      memcached_client:
        host: loki-loki-distributed-memcached-frontend.loki.svc.cluster.local
        service: http

If I echo stats - say on the chunk instance, I see connections (all 3 memcached show current connections):

echo stats | nc 127.0.0.1 11211
STAT pid 1
STAT uptime 82280
STAT time 1629219205
STAT version 1.6.10
STAT libevent 2.1.12-stable
STAT pointer_size 64
STAT rusage_user 7.186096
STAT rusage_system 13.927203
STAT max_connections 1024
STAT curr_connections 33
STAT total_connections 8563
STAT rejected_connections 0
STAT connection_structures 52
STAT response_obj_oom 0
STAT response_obj_count 1
STAT response_obj_bytes 65536
STAT read_buf_count 33
STAT read_buf_bytes 540672
STAT read_buf_bytes_free 458752
STAT read_buf_oom 0
STAT reserved_fds 20
STAT cmd_get 38988
STAT cmd_set 26965
STAT cmd_flush 0
STAT cmd_touch 0
STAT cmd_meta 0
STAT get_hits 12235
STAT get_misses 26753
STAT get_expired 0
STAT get_flushed 0
STAT delete_misses 0
STAT delete_hits 0
STAT incr_misses 0
STAT incr_hits 0
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 0
STAT cas_badval 0
STAT touch_hits 0
STAT touch_misses 0
STAT auth_cmds 0
STAT auth_errors 0
STAT bytes_read 9828550092
STAT bytes_written 3137411488
STAT limit_maxbytes 67108864
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT time_in_listen_disabled_us 0
STAT threads 4
STAT conn_yields 0
STAT hash_power_level 16
STAT hash_bytes 524288
STAT hash_is_expanding 0
STAT slab_reassign_rescues 20
STAT slab_reassign_chunk_rescues 0
STAT slab_reassign_evictions_nomem 123
STAT slab_reassign_inline_reclaim 1
STAT slab_reassign_busy_items 6
STAT slab_reassign_busy_deletes 0
STAT slab_reassign_running 0
STAT slabs_moved 73
STAT lru_crawler_running 0
STAT lru_crawler_starts 52
STAT lru_maintainer_juggles 193726
STAT malloc_fails 0
STAT log_worker_dropped 0
STAT log_worker_written 0
STAT log_watcher_skipped 0
STAT log_watcher_sent 0
STAT unexpected_napi_ids 0
STAT round_robin_fallback 0
STAT bytes 45260110
STAT curr_items 200
STAT total_items 26985
STAT slab_global_page_pool 0
STAT expired_unfetched 0
STAT evicted_unfetched 16715
STAT evicted_active 42
STAT evictions 19772
STAT reclaimed 0
STAT crawler_reclaimed 0
STAT crawler_items_checked 44
STAT lrutail_reflocked 2751
STAT moves_to_cold 26787
STAT moves_to_warm 2645
STAT moves_within_lru 2352
STAT direct_reclaims 20281
STAT lru_bumps_dropped 0
END

I see the following in the querier pods:

level=error ts=2021-08-17T16:26:45.218691428Z caller=memcached.go:235 msg=“failed to put to memcached” name=chunks err=“server=10.42.5.10:11211: write tcp 10.42.4.64:49594->10.42.5.10:11211: i/o timeout”

ts=2021-08-17T16:26:45.717102364Z caller=spanlogger.go:87 org_id=fake traceID=3b5b799d0f27974d method=Memcache.GetMulti level=error msg=“Failed to get keys from memcached” err=“read tcp 10.42.4.64:48388->10.42.5.10:11211: i/o timeout”

Frontend doesn’t mention memcached at all.
Neither does investor, but does reference: level=warn ts=2021-08-17T16:08:55.650619791Z caller=experimental.go:19 msg=“experimental feature in use” feature=“In-memory (FIFO) cache”

Any thoughts on why it may not be working or how I can troubleshoot?

My search performance for anything > 3 hours is horrible, so I’m trying to get better performance with caching.

Thanks!

tlipatov · September 1, 2021, 9:19pm

I have the same issue:

ts=2021-09-01T19:46:41.830245033Z caller=spanlogger.go:87 org_id=1 traceID=4132e74cbcc9207e method=Memcache.GetMulti level=error msg=“Failed to get keys from memcached” err=“read tcp 1.2.3.4:60292->1.2.3.4:11211: i/o timeout”

Running load-tests against memcached passes fine. I tried different configurations and pod sizes and replicas.

nesabian · September 23, 2021, 7:11am

I have the same problem, everything seems fine on Memcached but I get i/o timeout error

moonape1226 · November 8, 2021, 6:59am

Having the same problem as well on loki-distributed Helm chart 0.38.1

moonape1226 · November 8, 2021, 8:00am

Found an issue in loki github repository mentioned similar error. Tried the config and it worked for me.

github.com/grafana/loki

memcache config in loki 2.0

opened 08:30PM - 11 Dec 20 UTC

closed 05:45PM - 13 Dec 20 UTC

mrmassis

Hi everyone, I have a problem with memcached. When i configure it, some loki co…meponets report warns. ``` level=warn ts=2020-12-11T19:13:19.590489326Z caller=memcached_client.go:209 msg="error updating memcache servers" err="lookup _memcached-client._tcp.memcached-index-queries.loki.svc.cluster.local:11211: no such host" ``` Why it happen? my configs: ``` storage_config: aws: s3: XXXX s3forcepathstyle: true boltdb_shipper: active_index_directory: /data/loki/index shared_store: aws cache_location: /data/loki/boltdb-cache index_queries_cache_config: memcached: batch_size: 100 parallelism: 100 memcached_client: #consistent_hash: true host: memcached-index-queries.loki.svc.cluster.local service: memcached-client chunk_store_config: chunk_cache_config: memcached: batch_size: 100 parallelism: 100 memcached_client: #consistent_hash: true host: memcached.loki.svc.cluster.local service: memcached-client max_look_back_period: 0s write_dedupe_cache_config: memcached: batch_size: 100 parallelism: 100 memcached_client: #consistent_hash: true host: memcached-index-writes.loki.svc.cluster.local service: memcached-client ``` Pods: ``` oki-loki-cdefa-logaas-compactor-98bfdc5f9-w56gz 1/1 Running 0 4m52s loki-loki-cdefa-logaas-distributor-7dfdc6cd47-jpkcl 1/1 Running 0 4m3s loki-loki-cdefa-logaas-distributor-7dfdc6cd47-khvxp 1/1 Running 0 3m14s loki-loki-cdefa-logaas-distributor-7dfdc6cd47-mztfq 1/1 Running 0 4m52s loki-loki-cdefa-logaas-ingester-0 0/1 Running 0 33s loki-loki-cdefa-logaas-ingester-1 1/1 Running 0 2m24s loki-loki-cdefa-logaas-ingester-2 1/1 Running 0 4m18s loki-loki-cdefa-logaas-querier-db565fb99-q6467 1/1 Running 0 4m52s loki-loki-cdefa-logaas-query-frontend-5c49b96879-75nvr 1/1 Running 0 4m52s loki-loki-cdefa-logaas-ruler-0 memcached-0 1/1 Running 0 42m memcached-1 1/1 Running 0 41m memcached-index-frontend-0 1/1 Running 0 40m memcached-index-frontend-1 1/1 Running 0 39m memcached-index-queries-0 1/1 Running 0 40m memcached-index-queries-1 1/1 Running 0 40m memcached-index-write-0 1/1 Running 0 40m memcached-index-write-1 1/1 Running 0 40m ```

Maybe you could try this:

  memcached_client:
    host: loki-loki-distributed-memcached-frontend.loki.svc.cluster.local
    service: memcache

system · November 8, 2022, 8:00am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Memcached seems not working Grafana Loki loki	2	270	January 14, 2025
Loki memcached chunks Out of memory errors Grafana Loki	6	7641	April 7, 2023
Loki memcached pod is not working Grafana Loki loki	2	1173	June 20, 2024
Loki helm config for s3 and dynamo Grafana Loki loki	2	707	February 14, 2023
Some starting questions about loki configuration Grafana Loki	2	963	August 24, 2024

Memcached Config in K8S Distributed model

Related topics