Unified Alerting - Alert Rule provisioning lifecycle examples needed

What Grafana?

Grafana 9.1.0 OSS, running AWS Linux (think RPM-based Linux).

What am I trying to do?

Provision Grafana Alert Rules and other alerting artifacts through the provisioning tools or APIs that I can script.

Why am I trying to do it?

We deploy Grafana to many different servers. To accomplish that, we deploy via “infrastructure-as-code,” and the option to manually deploy/tweak a configuration is simply not an option.

How am I trying to do it?

Here is the basic lifecycle I want to implement to deploy Grafana:

a.) define an alert rule in the UI - this increases the definition will be valid. I don’t want to have to write Alert Rules in raw JSON or YAML.
b.) export the alert rule - currently, the /api/v1/provisioning/alert-rules/ endpoint allows me to export the alert rule. The result is a JSON file representing a single alert rule that I can save as a JSON file and manage separately.
c.) provision Grafana in a template instance (an AMI in AWS-speak)- copy the JSON file into a provisioning/alerts dir. (I happen to use Ansible).
d.) deploy the instance - Incidentally, I use terraform to deploy the AMI into AWS.
e.) start Grafana at launch time - when the instance starts issuing systemctl start grafana-server.service.
f.) bonus: modify/tweak the deployed rule, then loop back to a.) Hey, why not? Grafana dashboards can easily be exported without saving to the database, so why not something similar related to alerting artifacts? (hint: I have had some success by manipulating the provenance table.)

What documentation have I used:

The documentation: Provision Grafana | Grafana documentation has the following encouraging comment:

You can manage alert objects in Grafana by adding one or more YAML or JSON configuration files in the provisioning/alerting directory.

But the example disappoints on two points:

  • The format is in YAML, which is not what the APIs produce, which is JSON
  • The example shows the deployment of Alert Rule Group, not an individual Alert Rule. The heading clearly says Rule, not Rule Group. Deploying several Alert Rules as a Group would complicate the administration of the rules.

When I take the JSON generated from b. and provision it, …well, crickets. I guess that’s Grafana-speak for “you’re doing something very wrong.” Hrumpf. Probably that Rule vs. Rule Group issue.

Questions:

1.) Does anyone who is provisioning Grafana 9 have input?
2.) Can someone give me an example of a valid JSON Alert Rule that will deploy using the provisioning infrastructure?
3.) Can someone explain how to export a valid YAML of an Alert Rule (not Alert Rule Group) that will deploy using the provisioning folders?

I actually opened a GitHub Issue about this. The API is not consistent, nor is it completely documented. Basic features, like “List all the rules” are not provided.

My scripts were created to run against Grafana 8.x before the provisioning API. I had to scrape/fuzz/google level 11 to find out how to integrate with the API. Here are some of my notes:

  • Grafana OpenAPI
  • Under there, you’ll find an entry point for the Prometheus AlertManager API at /api/alertmanager/grafana/api/ I think… I’m still exploring this one
  • There’s a “ruler” API which is not public which does provide “List all the rules” AFAICT, this is the only way to get that list

Here’s what I did:

  1. Create a rule in the UI
  2. Get all the folders: 'GET /api/folders`
  3. Use GET /api/ruler/grafana/api/v1/rules/{{ Folder }} to retrieve the rule JSON
  4. Make that JSON my template for new rules, modify as necessary, POST /api/ruler/grafana/api/v1/rules/{{ Folder }} the rule JSON.

That worked perfect with v8, but with v9, when editing rules created like that, it causes my browser tab to lock completely and so bad the only fix is to close it. My plan is to retry with the provisioning API.

  1. Create a Rule to Use as a template/reference (probably per datasource)
  2. Note the UIDs (in the URL bar) or use the GET /api/ruler/grafana/api/v1/rules/{{ Folder }} to list the rules/UIDs.
  3. Use GET /api/v1/provisioning/alert-rules/{UID} to fetch the JSON of the rule body
  4. Modify that JSON into a template for my scripts to use
  5. Use POST /api/v1/provisioning/alert-rules in the Provisioning API to create new rules.

I’m hoping this process doesn’t cause the editing of the rules to lock the browser.

FWIW, you can view the API by checking the generated_base_* files pkg/services/ngalert/api/ directory.

Thanks for the feedback, @blhotsky. Good to know that I’m not the only one struggling with the current docs.

I’m beginning to think that provisioning-from-code is a use case that has not been fully thought through yet. At the very least, it seems like a patchwork of features that is not well-explained. For example, the inability to easily edit or export provisioned Alert Rules is particularly problematic for me. (I’ve had to resort to hacking the database to get that done.) Other features are missing for no apparent reason, ref, API Documentation Inconsistent and API features missing in 9.1.1 · Issue #54418 · grafana/grafana · GitHub

That browser lock up - oooof. I’ve not run into that…yet.