-
What Grafana version and what operating system are you using?
Helm chart mimir-distributed with mimir tag 2.0.0 on EKS, K8s version 1.21 -
What are you trying to achieve?
Trying to reinstall mimir with ingesters running -
How are you trying to achieve it?
Using Helm to reinstall mimir -
What happened?
All ingesters go into crash loop backoff -
What did you expect to happen?
ingesters should run successfully -
Can you copy/paste the configuration(s) that you are having problems with?
serviceAccount:
create: false
name: s3-full
minio:
enabled: false
store_gateway:
replicas: 1
sharding_ring:
kvstore:
store: memberlist
ingester:
nodeSelector:
role: metrics
tolerations:
- key: role
operator: Equal
value: metrics
effect: NoSchedule
mimir:
# -- Config file for Grafana Mimir, enables templates. Needs to be copied in full for modifications.
config: |
{{- if not .Values.enterprise.enabled -}}
multitenancy_enabled: false
{{- end }}
limits:
ingestion_rate: 40000
max_global_series_per_user: 1000000
max_global_series_per_metric: 0
activity_tracker:
filepath: /data/metrics-activity.log
alertmanager:
data_dir: '/data'
enable_api: true
external_url: '/alertmanager'
alertmanager_storage:
backend: s3
s3:
endpoint: s3.${region}.amazonaws.com
bucket_name: ${alertmanager_bucket}
region: ${region}
frontend_worker:
frontend_address: {{ template "mimir.fullname" . }}-query-frontend-headless.{{ .Release.Namespace }}.svc:{{ include "mimir.serverGrpcListenPort" . }}
ruler:
enable_api: true
rule_path: '/data'
alertmanager_url: dnssrvnoa+http://_http-metrics._tcp.{{ template "mimir.fullname" . }}-alertmanager-headless.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}/alertmanager
server:
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
grpc_server_max_concurrent_streams: 1000
frontend:
log_queries_longer_than: 10s
align_queries_with_step: true
compactor:
data_dir: "/data"
ingester:
ring:
final_sleep: 0s
num_tokens: 512
ingester_client:
grpc_client_config:
max_recv_msg_size: 104857600
max_send_msg_size: 104857600
runtime_config:
file: /var/{{ include "mimir.name" . }}/runtime.yaml
memberlist:
abort_if_cluster_join_fails: false
compression_enabled: false
join_members:
- {{ include "mimir.fullname" . }}-gossip-ring
# This configures how the store-gateway synchronizes blocks stored in the bucket. It uses Minio by default for getting started (configured via flags) but this should be changed for production deployments.
blocks_storage:
backend: s3
tsdb:
dir: /data/tsdb
wal_compression_enabled: true
retention_period: 4h
bucket_store:
sync_dir: /data/tsdb-sync
{{- if .Values.memcached.enabled }}
chunks_cache:
backend: memcached
memcached:
addresses: dns+{{ .Release.Name }}-memcached.{{ .Release.Namespace }}.svc:11211
max_item_size: {{ .Values.memcached.maxItemMemory }}
{{- end }}
{{- if index .Values "memcached-metadata" "enabled" }}
metadata_cache:
backend: memcached
memcached:
addresses: dns+{{ .Release.Name }}-memcached-metadata.{{ .Release.Namespace }}.svc:11211
max_item_size: {{ (index .Values "memcached-metadata").maxItemMemory }}
{{- end }}
{{- if index .Values "memcached-queries" "enabled" }}
index_cache:
backend: memcached
memcached:
addresses: dns+{{ .Release.Name }}-memcached-queries.{{ .Release.Namespace }}.svc:11211
max_item_size: {{ (index .Values "memcached-queries").maxItemMemory }}
{{- end }}
s3:
endpoint: s3.${region}.amazonaws.com
bucket_name: ${metrics_bucket}
region: ${region}
ruler_storage:
backend: s3
s3:
endpoint: s3.${region}.amazonaws.com
bucket_name: ${ruler_bucket}
region: ${region}
{{- if .Values.enterprise.enabled }}
multitenancy_enabled: true
admin_api:
leader_election:
enabled: true
ring:
kvstore:
store: "memberlist"
{{- if .Values.minio.enabled }}
admin_client:
storage:
type: s3
s3:
endpoint: {{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000
bucket_name: enterprise-metrics-admin
access_key_id: {{ .Values.minio.accessKey }}
secret_access_key: {{ .Values.minio.secretKey }}
insecure: true
{{- end }}
auth:
type: enterprise
cluster_name: "{{ .Release.Name }}"
license:
path: "/license/license.jwt"
{{- if .Values.gateway.useDefaultProxyURLs }}
gateway:
proxy:
default:
url: http://{{ template "mimir.fullname" . }}-admin-api.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
admin_api:
url: http://{{ template "mimir.fullname" . }}-admin-api.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
alertmanager:
url: http://{{ template "mimir.fullname" . }}-alertmanager.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
compactor:
url: http://{{ template "mimir.fullname" . }}-compactor.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
distributor:
url: http://{{ template "mimir.fullname" . }}-distributor.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
ingester:
url: http://{{ template "mimir.fullname" . }}-ingester.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
query_frontend:
url: http://{{ template "mimir.fullname" . }}-query-frontend.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
ruler:
url: http://{{ template "mimir.fullname" . }}-ruler.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
store_gateway:
url: http://{{ template "mimir.fullname" . }}-store-gateway.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
{{- end }}
instrumentation:
enabled: true
distributor_client:
address: 'dns:///{{ template "mimir.fullname" . }}-distributor.{{ .Release.Namespace }}.svc:{{ include "mimir.serverGrpcListenPort" . }}'
{{- end }}
- Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
level=error ts=2022-05-31T19:08:57.640268904Z caller=ingester.go:1599 msg="unable to open TSDB" err="failed to open TSDB: /data/tsdb/anonymous: open /data/tsdb/anonymous/wal/00000282: no space left on device" user=anonymous
level=error ts=2022-05-31T19:08:57.640311897Z caller=ingester.go:1675 msg="error while opening existing TSDBs" err="unable to open TSDB for user anonymous: failed to open TSDB: /data/tsdb/anonymous: open /data/tsdb/anonymous/wal/00000282: no space left on device"
level=error ts=2022-05-31T19:08:57.64039023Z caller=mimir.go:471 msg="module failed" module=ingester-service err="invalid service state: Failed, expected: Running, failure: opening existing TSDBs: unable to open TSDB for user anonymous: failed to open TSDB: /data/tsdb/anonymous: open /data/tsdb/anonymous/wal/00000282: no space left on device"
-
Did you follow any online instructions? If so, what is the URL?
Installed mimir-distributed using Helm -
Notes
I have tried reinstalling Mimir and limiting it to K8s nodes with large volumes, but I’m still getting no space left on device errors. I also tried blowing away the contents of my bucket but to no avail. I am using IAM profiles to authenticate to S3, which works fine. I’m not getting any errors about shipping logs to S3. I don’t understand how Mimir could still be running up against a “no space left on device” error when it’s on a fresh node and a fresh install.