Multiple Data sources blended into a single data source after S3 migration

Hey guys,

we’ve been using 3 monolithic Loki servers running on local disk to capture syslog from our network. It was 3 servers for 3 data hubs. Everything was working as intended with having 3 data sources on Grafana to query etc. As you could expect, every data source had access to only hosts which are connected to their Loki.

Recently we’ve started to send logs to S3 (our local Ceph server); all 3 servers shares the same bucket. It seemed like to work as intended but I’ve just realized that whichever data source I use on Grafana, every query uses the “whole” database. Meaning even if I choose Loki1 as data source, I’m getting results from all 3.

Granted, they now share the same bucket but I would expect they are grouped differently so that if there is query for a specific data source, only data belonging to that data source should be pulled.

Do I need to configure something differently or as long as they share the same bucket, Loki/Grafana will act as if there is only a single data source, regardless of different Loki servers.

Thank you!

You don’t want to share the same S3 bucket across different Loki instances / clusters.

Is it against best-practice or it could lead to lost/duplicate/corrupt data?

I’ve added a unique label on Promtail on 3 servers to distinguish/filter the logs but they are still using the same bucket/data source.

As you had already discovered, by sharing the backend S3 bucket between loki clusters you are effectively sharing the data across your loki clusters even though they are supposed to be in a cluster. This leads to:

  1. Each cluster being able to see each other’s logs, not ideal since you must’ve separated the clusters for a reason.
  2. Since each of your clusters would be running a compactor this can lead to compactor conflict and could lead to data loss.

You can’t change the directory structure for Loki, it’s best to just create separate S3 buckets dedicated to each instance.

1 Like