Alternative solution for Loki storage without using a cloud solution

etiennedesclaux · July 27, 2023, 8:24pm

Hi,

I need some help concerning the Loki storage.

I cannot use a cloud solution to store logs coming from Promtail instances. The logs need to be stored internally.

Until now I have been using the Loki filesystem but I feel like it’s slow and not reliable to store the logs.

Are there other alternatives I can use? How much slower is the filesystem compared to the cloud solutions?

Is it normal it feels slow and not efficient when using the filesystem? For example, as soon as I start querying longer time ranges (past 90 days), it uses a lot of memory and CPU on the server running Loki and it even crashed a few times. I have about 500MB of logs going to Loki.

Thank you for your help

dannykopping · July 28, 2023, 7:44am

Hi there

Object storage will usually be better for Loki because it can scale well and provide consistent performance since files are spread across multiple disks. HDDs should never be used with Loki, and even with SSDs performance can be variable; it really depends on your workloads.

In order to provide you with some actionable steps, I think we need to get down to some specifics.

in which environment is Loki running? Cloud? On-prem? Your laptop?
which filesystem are you using, and on what kind of disk?
do you have 500MB of logs in total, or being added per day/month?
what query are you running when Loki uses a lot of CPU & memory?

etiennedesclaux · July 28, 2023, 2:40pm

Thank you for your response,

Loki is running on-prem in a docker container on a Linux server.

It’s in a docker volume on an NVME.

I have around 500MB of logs in total, it will increase but not much.

The worst dashboard that uses a lot of memory is this one. It says ‘Out of Memory’ when I put the period of the past year for example (which is essentially the 500MB of logs)
It makes these three queries (at once on the same dashboard), one for each level (debug, error, warning)

count_over_time({filename=~"$Filename", Level=~"Debug", Plan=~"$Plan", JobTitle=~"$JobTitle", Name=~"$Name", Case=~"$Case", Project=~"$Project", User=~"$User", Code=~"$Code"} [$__interval])

count_over_time({filename=~"$Filename", Level=~"Error", Plan=~"$Plan", JobTitle=~"$JobTitle", Name=~"$Name", Case=~"$Case", Project=~"$Project", User=~"$User", Code=~"$Code"} [$__interval])

count_over_time({filename=~"$Filename", Level=~"Warning", Plan=~"$Plan", JobTitle=~"$JobTitle", Name=~"$Name", Case=~"$Case", Project=~"$Project", User=~"$User", Code=~"$Code"} [$__interval])

For sure these are three big queries, but I feel like it should still be able to handle them?

dannykopping · July 28, 2023, 5:14pm

OK, thanks for the detail.

Firstly: can you please send your Loki config? Is this a single container running all of Loki? Have you set CPU/memory limits on your container?

Secondly: metric queries like this can be cached. You should consider adding a results_cache: Grafana Loki configuration parameters | Grafana Loki documentation. This can either be in-process or an external cache like memcached.

etiennedesclaux · July 28, 2023, 6:18pm

Ok so after searching I think it’s my internet browser that crashes (Out of memory) because in my Loki config I set max_query_series to a high value since Grafana would give me an error because it would always go over that limit with my queries.

Here is my config:

auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

compactor:
  retention_enabled: true

limits_config:
  ingestion_rate_strategy: global
  max_query_length: 0h
  reject_old_samples: false
  reject_old_samples_max_age: 0h
  max_query_series: 100000
  retention_period: 365d
ruler:
  alertmanager_url: http://localhost:9093

But i’m wondering is it normal that I need to change the config to simply load 500MB of logs? When I see that some applications have TB of logs per day I feel like something is wrong here.

I have not set CPU/memory limits

** Thank you, I will look up on how to cache metric queries**

** EDIT:
For example I can’t show the frequency of the last 7 days of logs on a graph without overriding the ‘max_query_series’ setting.

dannykopping · July 28, 2023, 7:32pm

For example I can’t show the frequency of the last 7 days of logs on a graph without overriding the ‘max_query_series’ setting.

You have too much cardinality in your labels. You should wrap your count_over_time queries in a sum aggregation to reduce these down to single series, instead of exceeding the 500 limit.

etiennedesclaux · July 28, 2023, 7:48pm

Thank you for your help, it was probably a stupid question but with your answer it now works!

dannykopping · July 28, 2023, 8:01pm

Not a stupid question at all!

It helps to be specific though; when you next ask a question, please include as much detail as possible upfront happy Lokiing!

system · July 27, 2024, 8:02pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Planning system requirements and storage Grafana Loki	5	4052	December 2, 2022
Loki local storage vs S3 Grafana Loki loki	1	360	April 18, 2024
Improving Performance in Loki System for Production Use Grafana Loki loki	3	3275	June 15, 2024
loki is consuming more space even for small logs. Grafana Loki	1	669	March 18, 2023
Loki performance and missing logs issues Grafana Loki loki	4	604	November 4, 2024

Alternative solution for Loki storage without using a cloud solution

Related topics