Monitoring for Azure Machine Learning Endpoint in Grafana, I don't see any Endpoints on the dropdown list.

Hi, I’m not sure if this is the right place, but I wanted to set up monitoring for Azure Machine Learning Endpoint in Grafana. Unfortunately, I don’t see any Endpoints on the dropdown list. I have Gitlab version 15.8.0 and Grafana version 9.4.7. Has anyone tried monitoring ML Endpoints through Grafana?

1 Like

Try unreleased version. There were some recent improvements, which may fix your problem Pull requests · grafana/grafana · GitHub

1 Like

@jangaraj Thank you for your response. Unfortunately, the situation looks the same on the list, and no “endpoints” are visible.

grafana cli --version
grafana version 10.0.0-112993pre

Are you able to get ML metrics in Grafana, when you specify all details manually in Advanced section + you will use Average aggregation?
image

1 Like

@jangaraj We tried this approach, but it doesn’t work either because it seems that Grafana is “rewriting” the Namespace, which causes it to refer to a non-existent resource.


I would check browser console and inspect Grafana requests/response. There can be more details about the error. Also compare Grafana requests with Azure console requests where it works - I’m not sure if you have a right parameters.

1 Like

@jangaraj

request visible in console is:
GRAFANA_URL/api/datasources/uid/UID/resources/azuremonitor/subscriptions/SUBSCRIPTION_ID/resourceGroups/MaciejML/providers/microsoft.machinelearningservices/workspaces/ENDPOINT_NAME/onlineendpoints//providers/microsoft.insights/metricdefinitions?api-version=2018-01-01

but it should be:
GRAFANA_URL/api/datasources/uid/UID/resources/azuremonitor/subscriptions/SUBSCRIPTION_ID/resourceGroups/MaciejML/providers/microsoft.machinelearningservices/workspaces/WORKSPACE_NAME/onlineendpoints/ENDPOINT_NAME/providers/microsoft.insights/metricdefinitions?api-version=2018-01-01

and I am not able to get this with different values in “Namespace” and “Resource Name” values that I’ve tried

seems like it is connected to “azure_monitor/url_builder.ts”, but I didn’t dig into code enough to track it down

querying resource with curl (according to Azure monitoring REST API walkthrough - Azure Monitor | Microsoft Learn )
works correctly.

curl --location --request GET 'https://management.azure.com/subscriptions/****/resourceGroups/MaciejML/providers/microsoft.machinelearningservices/workspaces/MaciejML/onlineendpoints/orders23/providers/microsoft.insights/metricDefinitions?api-version=2018-01-01’ --header ‘Authorization: Bearer ****’

OK, then is probably a good time to open Grafana GitHub issue. Check if someone else didn’t opened it Issues · grafana/grafana · GitHub
If you have paid Grafana support then you can contact them with this problem.

2 Likes

@jangaraj Thank you very much for your help, I have reported it to Grafana.

@melori.arellano Hi, how can I get an answer to this error? We couldn’t resolve it on the forum, and you closed the issue on GitHub. BR Maciej

1 Like

@maciejglowacki are the metrics you want to get listed on Azure Monitor supported metrics by resource type - Azure Monitor | Microsoft Learn ?

I didn’t see them there. The grafana datasource supports these three services and their supported metrics:

  • Azure Monitor Metrics: Collect numeric data from resources in your Azure account.
  • Azure Monitor Logs: Collect log and performance data from your Azure account, and query using the Kusto Query Language (KQL).
  • Azure Resource Graph: Query your Azure resources across subscriptions.

Yes, it should be Azure Monitor Metric - link to specific metric set: Azure Monitor supported metrics by resource type - Azure Monitor | Microsoft Learn

1 Like

I would like to monitor an Endpoint in Azure Machine Learning. When I have a trained model, it is deployed as an Endpoint. Such an Endpoint has several metrics, such as ‘Request Latency P50’, ‘New Connections Per Minute’, etc. Without these metrics, I cannot see if my AI service is working correctly.

They should be there and look something like this in the query editor if those metrics are being scraped by your Azure Monitor datasource config. If something isn’t in the dropdown you should also be able to manually type it into the query editor if it exists.

To dig into Machine Learning Online Endpoints metrics You have to use Advanced resource selector and fill it manually as follows:
Subscription: Your subscription ID
Namespace: microsoft.machinelearningservices/workspaces//
Region: region where Your endpoint is located
Resource Group: Group name, where Your resource is placed
Resource Name: WORKSPACE_NAME/onlineendpoints/ENDPOINT_NAME

When filled correctly grafana should pick up correct URL to the resource.
It is important to use double slash ‘//’ at the end of Namespace,
and remember to replace WORKSPACE_NAME and ENDPOINT_NAME with their corresponding names.
For example given above it would be: MaciejML/onlineendpoints/orders23

2 Likes

Is it only a workaround or it is expected and documented approach?
It is not intuitive that some resources must be selected manually and they must have specific namespace format (double slash).

it is only workaround, I think grafana resource picker should be fixed.
I suspect that double slash works in this case, because it is treated as “empty” workspace name, but correct workspace name is provided in Resource Name.

correctly it should be:
Namespace: microsoft.machinelearningservices/workspaces/onlineendpoints
Workspace Name:
Resource Name:

but there is no field to define Workspace Name, and providing it in Namespace is resulting in wrong URL to resource.

Conclusion is, that there should be additional field “Workspace Name” to be set,
or it should be possible to correctly process it from Namespace

In this workaround Machine Learning workspace is used as Namespace, and Resource Name field is used to “override” this Namespace to correct one (which is Machine Learning Online Endpoint)

1 Like

Is there any GitHub issue for Azure resource picker, which can be watched?
Unfortunately, Azure datasource is not used by many people, so this kind of advanced features may stil not inplemented correctly there.

Previous workaround was not entirely correct, and resulted in error when selecting metric:

request failed, status: 400 Bad Request, error: {"error":{"additionalInfo":[{"type":"string","info":"TraceId={****}"},{"type":"string","info":"ExceptionType=Microsoft.Online.Metrics.MetricsMP.Utilities.RPRequestFormatException"}],"code":"BadRequest","message":"Detect invalid value: microsoft.machinelearningservices/workspaces// for query parameter: 'metricnamespace', the value must be: microsoft.machinelearningservices/workspaces/onlineendpoints if the query parameter is provided, you can also skip this optional query parameter."}}

Correct and working settings are:
Namespace: microsoft.machinelearningservices/workspaces/onlineendpoints
Resource Name: WORKSPACE_NAME/ENDPOINT_NAME

2 Likes

There’s a discussion about it here that was ultimately closed but has some good context. I’ll reopen @maciejglowacki’s github issue to have the dev squad weigh in on whether they expect this one to autopopulate and/or to add documentation about how to manually add a resource if it’s not showing up.

1 Like