Demystifying MySQL Group Replication Monitoring

One of the more difficult services and clusters to monitor is the MySQL Group Replication cluster or InnoDB cluster. For a lot of us that have moved into the group replication realm, we seem to be lacking tools that can help us see what’s going on and to get alerts that are reliable

Here’s where my faithful trio comes in to save the day! BASH, MySQL and Grafana. Now I know there are other options for this monitoring, but this worked out the best for me. Plus I can poll and alert off of the data I want

The big key here was writing a BASH script that not only used hostnames from a list to connect via SSH, but to also use the hostnames to find out Node Status and Node Role. These two items will be our focus for today:

image

The key to getting Node Status working correctly was using if, elif and else. MySQL Group Replication has five node statuses that can be displayed - ONLINE, RECOVERING, ERROR, OFFLINE and UNREACHABLE. To do this I did the following

In this, I use the $TARGET not only for the host to connect via ssh, but to also use later in my query to find out the Node Status. At this point I start my if, elif else section for each Node Status. I make these numerical values to work within Grafana correctly for alerting and data display in dashboards. In the panel on the dashboard we will be reverting back to text. For node role it was a simple if else for Primary and Secondary. These variables, as well as other variables I am polling for, I insert into a separate database

As we can see, NodeStatus is inserted. All of these are 1 because they are all ONLINE. For our Role, 1 indicates Primary, 0 Secondary. Now on to Grafana.

I’m going to take for granted that most of us know how to create a data source for Grafana. What might not be known to well is how to create dashboard variables. So in our dashboard setting we go to Variables and Create Variable. We are going to do a query. In this case we want to select by Hostname for our dashboard

For our query, we pick our data source and write the query to get the Hostname. This now allows us to use variables within our panel query

It’s now a matter of writing a query for our data source and incorporating the dashboard variable, here seen as ‘$host’ to choose the correct host for the dashboard

Remember when I said we will be mapping values back to text? It’s that time. So for each value that I have for each NodeStatus value, I make a condition that displays the appropriate Node Status state and color coded. So why have values and convert back to text? Why not just have a string displayed?

Alerting! There has been so many times something has happened with the cluster and no one was aware. So with Grafana we now use our data source for writing dynamic alerts

As we can see, we use a similar query to get the data we want. We then write expressions for what we want for alerting to be triggered on

In this case I have my threshold as a range due to how I did my if, elif and else loop. I probably could’ve wrote it more organized, but it works.

I can now write my notifications with the dynamic information shown to give an accurate alert for the database team.

I’m sure there are lots of ways to write your dashboards and alerts utilizing data sources that you are familiar with. BASH/MySQL works for me in my case. And if you know any tips to improve what I’ve done, feel free to write

EDIT

After thinking about the different node statuses available I have decided to write an alert for each status:

Each alert is checking a math expression equal to the value I’m alerting for

Now I can get an accurate alert of Node Status

3 Likes