Ingest large volume of logs using monolithic installation

Hi, I am trying to ingest a very large volume of logs in grafana loki monolithic installation. I notice that aws lamda extension using a promtail client is skipping large log entries from the lambda function. I see following error message in cloudwatch logs. I would greatly appreciate if any body can guide me how to resolve this issue.
2024/05/01 23:20:00 promtail.ClientProto: Unexpected HTTP status code: 500, message: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (27575603 vs. 4194304)

I believe you can fix that error message by tuning grpc_server_max_recv_msg_size and grpc_server_max_send_msg_size.

BTW, how “large” is large?

Hello @tonyswumac Thanks a lot for your response. I will try this out and get back to you. It’s around 200 MB of logs per execution of the lambda function.

200MB to a monolithic per run sounds pretty heavy. You might want to consider adding some sort of rate limiting to the promtail client inn the lambda function. I’ve never used the lambda / promtail function before, so I am not sure how configurable it is. Worst case you could write your own lambda function and just do Loki API call instead.

It seems to be working after adding following to loki configuration. Now I am able to see the logs
server:
grpc_server_max_recv_msg_size: 28575603
grpc_server_max_send_msg_size: 28575603

@tonyswumac Could you please suggest how to do a production grade configuration of Loki on Amazon EC2 server? Is it possible to scale a monolithic deployment?

Yes, according to the documentation to scale monolithic cluster you’d simply deploy more ec2 instances with the same configuration. You’d need to be using an object storage as backend, of course.

And if you are doing that, you might as well consider going with some sort of container platform such as ECS to make it easier to scale up and down.

@deepeshthoduvayil1 How did you managed to increase the grpc limit? I am increasing it to some other number but it seems like I can not extend beyond some 104MB in monolithic setup.

@tonyswumac Do you have any document which can be followed to use monolithic Loki for large data? Basically what you have suggested.

Thanks

I don’t.

How large is large? If you are considering scaling at all I would recommend you to at least do simple scalable deployment.

I think around 40-50MBps. Will monolithic with multiple replica fine for this? And how the traffic flows for multiple replicas? Is there any need to LB there?

If it’s only 50MBps you might be able to get away with just one instance honestly.

If you want to run a cluster, how you deploy Loki, whether monolithic or SSD, doesn’t really matter. What matters is network. Things you’ll need:

  1. You’ll want to make sure each Loki instances have their own IP and can talk to each other on gRCP, HTTP, and gossip ports. You’ll need service discovery so they know how to reach each other, if you are running static hosts service discovery can be A records.
  2. You’ll need a load balancer in front of all your Loki instances.

Currently I am deploying Loki as a service on AWS EC2 and there are no plans moving to k8s yet. Do you know if using multiple Loki replicas in single binary deployment, how would traffic flow between them? Is there any provision to use LB between them or Loki will handle itself?