Alert Notifications Degrading Graphite Perfomance

When trying to use grafana alerting with email notifications, we’re noticing that our graphite webapp is constantly using all of its available workers and holding onto them until eventually timing out. We were not hitting this issue with just the alerts, but when we added notifications to the alerts, the results have been severe.

We’re running with our grafana running against a local graphite webapp which is configured with three remote graphite webapp cluster servers. CPU Utilization on the remote servers has increased but not drastically. The local graphite has dropped to almost nothing and times out in a browser. Also, I’m noticing that the local graphite is timing out in supervisord logs.

Any advice would be greatly appreciated.

the notifications are not adding any strain as they only fire when alert fire, so unless you have hundreds of alert notifications triggering all the time it is not the cause.

How many alert rules do you have and how often are they evaluated ?