Best practice for montioring a SaaS estate

hobbesuk · February 18, 2020, 4:24pm

I’m working on trying to create a single pane of glass to give us an overview of a SaaS setup that our firm is offering to the market, this is running in azure across multiple resource groups, generally one resource group per tenant of our service.

I’m looking for some advice on how best to do this, please point me at any case studies or documentation if they exist on this sort of topic (I wasn’t able to find any searching hence the posting).

What I’m looking to do is create a dashboard that gives us an overall idea of the health of the key metrics across our estate and I want this to be easy to extend as we bring on new clients, so ideally not having to add a new series for each tenant as they come online (it looks like repeating resource groups isn’t supported for azure, so I’m hoping the grafana API can be a help here).

I would like to create different panels for different metrics which just show the hottest series across all the resources that are being monitored, e.g. across all our PaaS db’s which are the top 5 CPU systems and hide all the other systems, this therefore giving us a view on the systems under the most load, the theory being is resource usage is lower then these system are just running as expected and don’t need to be highlighted.

Is this something that is possible?

Thanks in advance!

hobbesuk · February 19, 2020, 2:00pm

So after some more investigation I think I might have hit on a solution, sharing here encase it helps others or if anyone wants to suggest why this isn’t a good idea feel free!

Breakthrough was realising that Asure Monitor isn’t a good source of data if your data is spread across multiple resource groups as this can’t be easily queried across.

I am now feeding my different SQL and AppService instances into a new central Log Analytics workspace, this is what I have configured grafana to query.

Then writing Kusto queries like the following I am able ot list for instance the length of my HTTP request queue, but only the top five worse offenders:

let Top_5 = AzureMetrics | where $__timeFilter(TimeGenerated)
| where ResourceProvider == "MICROSOFT.WEB"
| where MetricName == "RequestsInApplicationQuee"
| top-hitters 5 of ResourceGroup by Maximum;
AzureMetrics
| where $__timeFilter(TimeGenerated)| where ResourceProvider == "MICROSOFT.WEB"
| where MetricName == "RequestsInApplicationQueue"
| where ResourceGroup in (Top_5)
| summarize by ResourceGroup, Maximum, TimeGenerated
| order by TimeGenerated asc

So I can now get a visulization showing only the most pressing problems and as an when new resource groups are configured to feed my central Log Analytics workspace they will get picked up by this query which means zero overhead on the monitoring which is a requirement.

Topic		Replies	Views
Can someone help - how to create a dashboard to show the standard metrics from Azure monitor for All VMs for all resource groups? Do not want to enable LogAnalytics or VMInsight. Just want to use standard metrics from Azure monitor Configuration azure	3	1136	February 2, 2023
Problems getting metrics from Azure in Grafana Azure Monitor	5	3279	January 3, 2022
Azure Monitor: Repeating Graph or Row Per Resource Group Azure Monitor azure	7	2392	February 12, 2019
Solution for alerting and monitoring base upon resource type Azure Monitor	2	358	July 19, 2024
Azure VMs monitoring using resource groups Azure Monitor	0	86	June 29, 2024

Best practice for montioring a SaaS estate

Related topics