Alternative solution for Loki storage without using a cloud solution

Hi,

I need some help concerning the Loki storage.

I cannot use a cloud solution to store logs coming from Promtail instances. The logs need to be stored internally.

Until now I have been using the Loki filesystem but I feel like it’s slow and not reliable to store the logs.

Are there other alternatives I can use? How much slower is the filesystem compared to the cloud solutions?

Is it normal it feels slow and not efficient when using the filesystem? For example, as soon as I start querying longer time ranges (past 90 days), it uses a lot of memory and CPU on the server running Loki and it even crashed a few times. I have about 500MB of logs going to Loki.

Thank you for your help

Hi there

Object storage will usually be better for Loki because it can scale well and provide consistent performance since files are spread across multiple disks. HDDs should never be used with Loki, and even with SSDs performance can be variable; it really depends on your workloads.

In order to provide you with some actionable steps, I think we need to get down to some specifics.

  • in which environment is Loki running? Cloud? On-prem? Your laptop?
  • which filesystem are you using, and on what kind of disk?
  • do you have 500MB of logs in total, or being added per day/month?
  • what query are you running when Loki uses a lot of CPU & memory?

Thank you for your response,

Loki is running on-prem in a docker container on a Linux server.

It’s in a docker volume on an NVME.

I have around 500MB of logs in total, it will increase but not much.

The worst dashboard that uses a lot of memory is this one. It says ‘Out of Memory’ when I put the period of the past year for example (which is essentially the 500MB of logs)
It makes these three queries (at once on the same dashboard), one for each level (debug, error, warning)

count_over_time({filename=~"$Filename", Level=~"Debug", Plan=~"$Plan", JobTitle=~"$JobTitle", Name=~"$Name", Case=~"$Case", Project=~"$Project", User=~"$User", Code=~"$Code"} [$__interval])

count_over_time({filename=~"$Filename", Level=~"Error", Plan=~"$Plan", JobTitle=~"$JobTitle", Name=~"$Name", Case=~"$Case", Project=~"$Project", User=~"$User", Code=~"$Code"} [$__interval])

count_over_time({filename=~"$Filename", Level=~"Warning", Plan=~"$Plan", JobTitle=~"$JobTitle", Name=~"$Name", Case=~"$Case", Project=~"$Project", User=~"$User", Code=~"$Code"} [$__interval])

For sure these are three big queries, but I feel like it should still be able to handle them?

OK, thanks for the detail.

Firstly: can you please send your Loki config? Is this a single container running all of Loki? Have you set CPU/memory limits on your container?

Secondly: metric queries like this can be cached. You should consider adding a results_cache: Grafana Loki configuration parameters | Grafana Loki documentation. This can either be in-process or an external cache like memcached.

Ok so after searching I think it’s my internet browser that crashes (Out of memory) because in my Loki config I set max_query_series to a high value since Grafana would give me an error because it would always go over that limit with my queries.

Here is my config:

auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

compactor:
  retention_enabled: true

limits_config:
  ingestion_rate_strategy: global
  max_query_length: 0h
  reject_old_samples: false
  reject_old_samples_max_age: 0h
  max_query_series: 100000
  retention_period: 365d
ruler:
  alertmanager_url: http://localhost:9093

But i’m wondering is it normal that I need to change the config to simply load 500MB of logs? When I see that some applications have TB of logs per day I feel like something is wrong here.

I have not set CPU/memory limits

** Thank you, I will look up on how to cache metric queries**

** EDIT:
For example I can’t show the frequency of the last 7 days of logs on a graph without overriding the ‘max_query_series’ setting.

For example I can’t show the frequency of the last 7 days of logs on a graph without overriding the ‘max_query_series’ setting.

You have too much cardinality in your labels. You should wrap your count_over_time queries in a sum aggregation to reduce these down to single series, instead of exceeding the 500 limit.

Thank you for your help, it was probably a stupid question but with your answer it now works!

Not a stupid question at all!

It helps to be specific though; when you next ask a question, please include as much detail as possible upfront :+1: happy Lokiing!

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.