Configuring Loki for HA

Is there a recommended Loki configuration for high availability, similar to Grafana’s https://grafana.com/docs/grafana/latest/tutorials/ha_setup/? There is a “query frontend” feature mentioned in the documentation: https://github.com/grafana/loki/blob/v1.5.0/docs/configuration/query-frontend.md, but this assumes the presence of Kubernetes.

6 Likes

Hi @gbrener - I checked in with the Loki team here at Grafana Labs. We don’t have official docs on this just yet (they’re on the way), however what the community is currently doing to achieve HA is deploying Loki to Kubernetes and referencing our Ksonnet config - this helps break Loki into microservices and it’s how our team runs HA for Loki: https://github.com/grafana/loki/tree/master/production/ksonnet

Thanks @samanthacoren1, that link is useful. It would be nice if there was a guide for running Loki in production without k8s. Any idea when the official docs might arrive?

3 Likes

@gbrener not entirely certain, but I’ll put a request in to see if we can get it prioritized. The team knows it’s important and there are quite a few different approaches to take even without Kubernetes - they want to make sure what we’re recommending in the documentation makes sense with upcoming release changes.

We’re really interested in Loki HA. Is there any new information/timeline on this?

Kubernetes isn’t an option for us at this time. Hopefully, eventually, but not as of today.

2 Likes

It’s a little frustrating that this hasn’t been addressed. Loki looks like exactly what we want, and is simple to get running in a non-ha configuration, but the documentation is at best, incomplete. Assuming k8s presupposes it’s availability, as well as expertise, neither of which are available here. I’d really like to know if I can run a distributor, ingester, and querier as separate processes on a single instance, with multiple copies for resilience.

The docs imply that this should be possible - I have a consul cluster running, and several VMS running consul agents joined to the cluster. Each VM should (I think) be able to run a trio of an ingestor, distributor, and querier; scaling independently as necessary.

I’m falling at the basic hurdle of configuring the distributor. The docs seem to suggest that I can use consul as the ring store, but specifying consul then requires a consul_config section - or does it - using “consul_config” produces an error: failed parsing config:

config/distributor.yaml: yaml: unmarshal errors:
line 13: field consul_config not found in type main.Config

Changing it to “consul” produces the same error. Using “memberlist” as the ring store seems to require having not - as the docs suggest - a “memberlist_config” section, but a “memberlist” section - this doesn’t work for consul.

So any updated (and complete!) documentation would be really gratefully received. I realise that it’s a moving target - and I hate documenting - but doing anything outside of what appears to be the only deployment case seems impossible.

HA config.

Having got (I believe) separate instances of queriers, distributors, and ingesters running on 3 VMs (so each VM runs all 3 processes for the moment), with Consul providing the ring storage (the config for that is not obvious! - it seems to work only if the ring storage is set to consul, but with no consul config block).

I’d love to know if this is a reasonable approach to HA - ie to run separate processes for each component (with autoscaling in the mix somehow), using S3 for chunk (and index, using the boltdb-shipper) storage, and with consul providing the ring storage.
Thanks.
PS for all that I’m frustrated with the docs, Loki seems like a fantastic product, and I’ve long been a fan of Grafana.

Grahamn,

Could you share the instructions to set up Loki in HA? I am struggling with the official documentation.

Regards

My email is graham@rockcons.co.uk - continue a conversation there?
Cheers,
Graham