Difference between Grafana Alert Manager and grafanacloud-<tenant>-nga Alert Manager

I now have (or have always had) two Alert Managers on the Contact Points page:

Grafana Alert Manager:
GrafanaAlertManager

and

NGA AlertManager
GrafanaNGA

What is the difference between the two alert managers?

  • Is it possible the NGA Alert Manager is the new “Next Generation AlertManager”?
  • If so, what is the difference in creating Contact Points (and message templates) in one vs. the other?
  • Does the fact that the alerts are built on the Azure DataSource have an impact on which AlertManager to use when setting up contact points?
  • Are the two AlertManagers an artifact of using the beta NextGen Alert Manager and there should only be one?
2 Likes

Hi @ehuggins,
What you see in that dropdown box is a list of alertmanagers connected to your grafana

  1. “Grafana” - is an embedded Alertmanager and it handles alerts created by the Grafana alerting, ie rules that you create using the type “Grafana Managed alert”. Those alert rules are evaluated by the Grafana itself.

image

  1. grafanacloud-<tenant>-ngalertmanager is a Prometheus (actually Cortex) alertmanager that we provide as a part of the stack. It is not a part of the Grafana and is connected as datasource. Unified alerting that became available recently puts those systems together and lets users manage both alerting systems via the same UI. You can add\edit Cortex\Loki alert rules.

Is it possible the NGA Alert Manager is the new “Next Generation AlertManager”?
it is an Alertmanager of Cortex cluster you have set up as datasource

If so, what is the difference in creating Contact Points (and message templates) in one vs. the other?

When you configure contact points (as well as notification policies) for an alertmanager selected in the dropdown, only that alertmanager is configured. The configurations are completely separate.

Does the fact that the alerts are built on the Azure DataSource have an impact on which AlertManager to use when setting up contact points?

It depends on what rule engine your alert rules are set up. In your case, I guess, the rule is evaluated by Grafana rule engine. Therefore, you need to configure contact points and notifications policies for “Grafana” Alertmanager. Later we will add a feature to Grafana rule engine to use external Alertmanager for notification, which will let you use the grafanacloud-<tenant>-ngalertmanager too.

Are the two AlertManagers an artifact of using the beta NextGen Alert Manager and there should only be one?

No. Those are two different instances of Alertamanager that you can use at your own discretion. Grafana rule engine is much powerful, it can query different data sources.

1 Like

@yuriy.tseretyan Perfect. Thank you for the explanation, that answers my question and clarifies that enabling Grafana On Call (GOC?) did not create a new Alert Manager.

One follow up question: Is the grafanacloud--ngalertmanager (Cortex Alert Manager) redundant now that Unified alerting is GA? If not, why not? Will the situation change with Grafana 8.3?

Is the grafanacloud–ngalertmanager (Cortex Alert Manager) redundant now that Unified alerting is GA?

The motivation behind Unified Alerting is to bring all data sources that provide alerting functionality (currently, only Prometheus and Loki are supported) under one unified user interface but do not replace them. We recommend using it for alerts that use Cortex and Loki as data sources.
In this particular case, one great benefit of Cortex Alertmanager (as well as Cortex Rule engine) is that it runs in HA mode whereas the embedded is just a single instance. Also, they (embedded vs cortex AM) support a different subset of contact points.
Embedded:
image

Cortex:
image

If not, why not? Will the situation change with Grafana 8.3?

Currently, as far as I know, we do not have plans on removing it, at least in a short, mid-term term. Thus, I am pretty sure it won’t be removed in 8.3.

1 Like

@yuriy.tseretyn

I’m not sure I understand:

“The motivation behind Unified Alerting is to bring all data sources that provide alerting functionality (currently, only Prometheus and Loki are supported)”

I am using Unified Alerting exclusively with the Azure DataSource, which is working well.

I was told last week that Unified Alerting is GA and the Azure Data Source is now supported – in Grafana Online.

As we are currently using Grafana Online it is not clear whether the distinction between HA and Single Instance are our concern , but thank you for that information.

I am working on a project to grab logs we keep in Azure CosmosDB as JSON documents and push specific information from these logs to Loki for alerting.
I will be certain to test both managers when setting up alerts.

Thank you again,

You use Grafana managed alerting which is part of Unified Alerting (so called ngalert). So, in this case, you can ignore the grafanacloud-<tenant>-ngalertmanager and configure contact points for “Grafana”.
However, if you push some logs to Loki, you will have a choice of what Alertmanager and rule engine you can use, and therefore ability to pick the best that suits your need. In your case, you probably will want to keep using the Grafana rules engine so you won’t have to maintain two Alertmanager configurations.

1 Like

@yuriy.tseretyan

Makes sense, I will keep that in mind.

Quick question while I have you.

I am having some issues understanding the concept of “Log lines”, particularly when setting up Grafana Alert Manager when keeping Cardinality low.

My case is a bit different than most as I’m parsing json data that can be extremely large (larger than 64K in most cases), so I’ll be selecting specific information from each json log to push to Loki.

If you have any good worked-example references for setting up logging of any type (flat file would be fine) from source to Grafana Agent, to querying data that would be terrific.

There are not many Azure based Grafana examples out in the wild, but I believe I have a good pattern for our CosmosDB stored Application logs*:

  • Azure Function triggered by Azure CosmosDb Change Processor, parsing and pushing select “log lines” to Azure Event Hubs
  • Azure Function triggered by Event Hub to push log lines to Grafana Agent
  • Grafana Agent to Loki Online

What I’m not quite clear about is what can/should constitutes a “log line”.

I’ve successfully tested pushing some older flatfile logs to Loki – but am not sure where “log line” stops and full file content starts.

*While I could just push the Functions output to Grafana Agent, EventHub is more scalable.

  • EventHubs are designed for high-volume log input and output
  • EventHubs events output is in time-based created order (although soon to be no longer a requirement for Loki, is/was a requirement)
  • Azure Log Analytics and Application insights can use EventHub as a sink target, making the Event Hub triggered Azure Function a repeatable pattern for pushing logs and metrics to almost every Azure resource to Prometheus and Loki

I expect to be configuring the Grafana Agent by the end-of-the-week.

Any detailed configuration examples you are aware of and can share would be appreciated.

Thank you again,