Loki disk space inflation

I don’t understand the space disk inflation from loki.

I have the settings like this

chunk_store_config:
  max_look_back_period: 96h

table_manager:
  retention_deletes_enabled: true
  retention_period: 96h

Anyway if I check in the container I can see in the folder /tmp/loki/chunks
In december with a 96h retention period, I have files from May, June, July…

Why, is there any clear and clever answer on this topic somewhere or this is a product functionnality: you keep the files from the beginning but you can request just the retention period?

If you are using recent version of Loki you’ll want to use compactor instead of table manager. See Retention | Grafana Loki documentation.

Yeah, I’ve spent the whole day on it yesterday.

The process is a good joke.

So you have to use compactor… But compactor does not delete the files, compactor is just here for compaction of index files and applying log retention…

The chunks files are not deleted while applying the retention algorithm on the index. They are deleted asynchronously by a sweeper process, yes, NOT the compactor !

Despite this previous indication, we also wan read:
retention_delete_delay is the delay after which the Compactor will delete marked chunks.”

Does the compactor delete the marked chunks ?

Who kows, cause the best part of the joke is here : “They are deleted asynchronously by a sweeper process and this delay can be configured by setting -compactor.retention-delete-delay .”

So the compactor deletes the marked chunks, you have a parameter for this, but in reality the compactor does not delete the chunks (the sweeper does).

Yes, it means you need a parameter for the compactor that is not in charge of deletion, to activate the secret sweeper process in charge of deletion.

I am very happy with this fucking thing, very happy with the documentation also. I’ve understood documentation is not important, it’s better to spend time developing new features instead of wasting the time in sub tasks like writing documentation.

It is also important to define parameters for a task, used by another task: this is the way to simplify the whole process. Yes, this is the way.


So what, how can I have a file deletion?

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
  - from: 2023-02-08
    store: boltdb-shipper
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 24h

compactor:
  # working_directory is the directory where marked chunks and temporary tables will be saved.
  working_directory: /tmp/loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  # retention_delete_delay is the delay after which the Compactor will delete marked chunks.
  retention_delete_delay: 10m
  # retention_delete_worker_count specifies the maximum quantity of goroutine workers instantiated to delete chunks
  retention_delete_worker_count: 150


      
ruler:
  alertmanager_url: http://localhost:9093 

querier:
  max_concurrent: 2048

frontend:
  max_outstanding_per_tenant: 2048
  compress_responses: true

  
chunk_store_config:
  max_look_back_period: 12h

hey~which version did u use?
I use Loki 2.5.0 but the config did not work by a sweeper process

Hello,

We use 2.9.1.

I have same problem with u. So I just want to make sure if Loki could fix these problems in recent version.

At first I delete chunk files by shell, by it cause high cpu used while deleting files. Sometimes it may show error log like ‘cant find file in /loki/chunks/fake/**’ as well.

And I check the code in Loki 2.5.0 named ‘DeleteChunks’:

func (c *store) DeleteChunk(ctx context.Context, from, through model.Time, userID, chunkID string, metric labels.Labels, partiallyDeletedInterval *model.Interval) error {
	metricName := metric.Get(model.MetricNameLabel)
	if metricName == "" {
		return ErrMetricNameLabelMissing
	}

	chunkWriteEntries, err := c.schema.GetWriteEntries(from, through, userID, metricName, metric, chunkID)
	if err != nil {
		return errors.Wrapf(err, "when getting index entries to delete for chunkID=%s", chunkID)
	}

	return c.deleteChunk(ctx, userID, chunkID, metric, chunkWriteEntries, partiallyDeletedInterval, func(chunk Chunk) error {
		return c.PutOne(ctx, chunk.From, chunk.Through, chunk)
	})
}

I think Loki only delete table information but not the chunk file.

Did u try some way to delete the chunk files?

1 Like

We finished with this file

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
  - from: 2022-12-19
    store: boltdb-shipper
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 24h

#  sweeper:
#    interval: 5m

compactor:
  # working_directory is the directory where marked chunks and temporary tables will be saved.
  working_directory: /tmp/loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  # retention_delete_delay is the delay after which the Compactor will delete marked chunks.
  retention_delete_delay: 15m
  # retention_delete_worker_count specifies the maximum quantity of goroutine workers instantiated to delete chunks
  retention_delete_worker_count: 150

table_manager:    
  retention_deletes_enabled: true
  retention_period: 15d

ruler:
  alertmanager_url: http://localhost:9093 

querier:
  # max_concurrent: 2048
  max_concurrent: 16384

frontend:
  max_outstanding_per_tenant: 2048
  compress_responses: true
  

I find the config like yours in loki source after getting your response.

The only differrnce between my config is (and mine was empty) :
common:
ring:
instance_addr: 127.0.0.1

And I find this config may affect all instance_address config. What’s more, Loki new compactor following :

func NewCompactor(cfg Config, storageConfig storage.Config, schemaConfig loki_storage.SchemaConfig, limits retention.Limits, clientMetrics storage.ClientMetrics, r prometheus.Registerer) (*Compactor, error) {
	...
	ringStore, err := kv.NewClient(
		cfg.CompactorRing.KVStore,
		ring.GetCodec(),
		kv.RegistererWithKVName(prometheus.WrapRegistererWithPrefix("loki_", r), "compactor"),
		util_log.Logger,
	)
	if err != nil {
		return nil, errors.Wrap(err, "create KV store client")
	}
	lifecyclerCfg, err := cfg.CompactorRing.ToLifecyclerConfig(ringNumTokens, util_log.Logger)
	if err != nil {
		return nil, errors.Wrap(err, "invalid ring lifecycler config")
	}
	...
}

So it should show error log while failing to init ‘lifecyclerCfg’. But I can see that there are many logs printed just at warning level. My compactor may init error but I don’t get error message.
I will test it later, thanks a lot. If I get firmly answer , I will reply u again.

I use simple-stack helm chart, Loki 2.6.1. In fact, configuration of deletion is not clear in docs and in helm chart. I configure compactor directly in values of chart of LOKI, which was downloaded to my disk. In original config of LOKI chart, it wasn’t all necessary options for LOKI in chart. It was not possible to configure compactor from main values file. After such changes, I can confirm that deletion of chunks is working correctly without manual deletion.

You should add following entry to values.

config:
limits_config:
retention_period: 24h
retention_stream:
- selector: ‘{namespace=“loki-stack”}’
priority: 1
period: 48h

I can’t find the entry about ‘retention_stream’ in Loki 2.5.0.

Do u have some tips in source code about when chunk files were deleted after I POST ‘/loki/api/admin/delete’?


func (s *Sweeper) Start() {
	s.markerProcessor.Start(func(ctx context.Context, chunkId []byte) error {
		status := statusSuccess
		start := time.Now()
		defer func() {
			s.sweeperMetrics.deleteChunkDurationSeconds.WithLabelValues(status).Observe(time.Since(start).Seconds())
		}()
		chunkIDString := unsafeGetString(chunkId)
		userID, err := getUserIDFromChunkID(chunkId)
		if err != nil {
			return err
		}

		err = s.chunkClient.DeleteChunk(ctx, unsafeGetString(userID), chunkIDString)
		if s.chunkClient.IsChunkNotFoundErr(err) {
			status = statusNotFound
			level.Debug(util_log.Logger).Log("msg", "delete on not found chunk", "chunkID", chunkIDString)
			return nil
		}
		if err != nil {
			level.Error(util_log.Logger).Log("msg", "error deleting chunk", "chunkID", chunkIDString, "err", err)
			status = statusFailure
		}
		return err
	})
}

I find the sweeper try to delete the chunks, but I can’t see any error logs about the process.
What’s more, I think Loki delete chunk files by fs_object_client by filesystem store-config.

func (f *FSObjectClient) DeleteObject(ctx context.Context, objectKey string) error {
	// inspired from https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/objstore/filesystem/filesystem.go#L195
	file := filepath.Join(f.cfg.Directory, filepath.FromSlash(objectKey))

	for file != f.cfg.Directory {
		if err := os.Remove(file); err != nil {
			return err
		}

		file = filepath.Dir(file)
		empty, err := isDirEmpty(file)
		if err != nil {
			return err
		}

		if !empty {
			break
		}
	}

	return nil
}
// GetChunks retrieves the specified chunks from the configured backend
func (o *Client) DeleteChunk(ctx context.Context, userID, chunkID string) error {
	key := chunkID
	if o.keyEncoder != nil {
		c, err := chunk.ParseExternalKey(userID, key)
		if err != nil {
			return err
		}
		key = o.keyEncoder(o.schema, c)
	}
	return o.store.DeleteObject(ctx, key)
}

And I find the key about the file location may be different about the chunk file path.
Loki miss chunk file in /loki/chunks/fake - Grafana Loki - Grafana Labs Community Forums