Grafana Agent won't scrape service

grafana agent operator version 0.28.0 deployed via helm.

I’m having a heck of a time getting the Grafana Agent to scrape a ServiceMonitor. Any ideas what I may be doing wrong here? I can proxy the endpoint and browse to the /actuator/prometheus URL and see the metrics just fine. I’m sure I’m just missing something here…thanks

apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.grafana.com/v1alpha1","kind":"MetricsInstance","metadata":{"annotations":{},"labels":{"agent":"grafana-agent"},"name":"o11y-prod","namespace":"monitoring"},"spec":{"podMonitorNamespaceSelector":{},"podMonitorSelector":{"matchLabels":{"instance":"o11y"}},"probeNamespaceSelector":{},"probeSelector":{"matchLabels":{"instance":"o11y"}},"remoteWrite":[{"url":"http://mimir.globalspec.cloud/api/v1/push"}],"serviceMonitorNamespaceSelector":{},"serviceMonitorSelector":{"matchLabels":{"instance":"o11y"}}}}
  creationTimestamp: "2022-12-07T17:48:14Z"
  generation: 2
  labels:
    agent: grafana-agent
  managedFields:
  - apiVersion: monitoring.grafana.com/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:labels:
          .: {}
          f:agent: {}
      f:spec:
        .: {}
        f:podMonitorNamespaceSelector: {}
        f:podMonitorSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:instance: {}
        f:probeNamespaceSelector: {}
        f:probeSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:instance: {}
        f:remoteWrite: {}
        f:serviceMonitorNamespaceSelector: {}
        f:serviceMonitorSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:instance: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2022-12-07T17:48:14Z"
  name: o11y-prod
  namespace: monitoring
  resourceVersion: "562169634"
  uid: 3be38243-a226-4ce4-b984-c518464e16d4
spec:
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      instance: o11y
  probeNamespaceSelector: {}
  probeSelector:
    matchLabels:
      instance: o11y
  remoteWrite:
  - url: http://mimir.globalspec.cloud/api/v1/push
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      instance: o11y
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: atlas-product-store
  namespace: atlas-product-store-dev
  labels:
    instance: o11y
spec:
  endpoints:
  - interval: 1m
    path: /actuator/prometheus
    port: http
    scheme: http
    scrapeTimeout: 30s
  namespaceSelector: {}
  selector:
    matchLabels:
      app.kubernetes.io/instance: atlas-product-store
      app.kubernetes.io/name: atlas-product-store
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: atlas-product-store
    meta.helm.sh/release-namespace: atlas-product-store-dev
  creationTimestamp: "2022-05-18T18:38:40Z"
  labels:
    app.kubernetes.io/instance: atlas-product-store
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: atlas-product-store
    helm.sh/chart: atlas-product-store-0.1.0
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/instance: {}
          f:app.kubernetes.io/managed-by: {}
          f:app.kubernetes.io/name: {}
          f:helm.sh/chart: {}
      f:spec:
        f:ports:
          .: {}
          k:{"port":5005,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
          k:{"port":8080,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:selector:
          .: {}
          f:app.kubernetes.io/instance: {}
          f:app.kubernetes.io/name: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: helm
    operation: Update
    time: "2022-05-18T18:38:40Z"
  name: atlas-product-store
  namespace: atlas-product-store-dev
  resourceVersion: "471876413"
  uid: da2ea333-036b-4eaa-b926-378a9292182b
spec:
  clusterIP: 10.43.162.41
  clusterIPs:
  - 10.43.162.41
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  - name: jvm-debug
    port: 5005
    protocol: TCP
    targetPort: jvm-debug
  selector:
    app.kubernetes.io/instance: atlas-product-store
    app.kubernetes.io/name: atlas-product-store
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Hi @mdiorio,

Did you have a look at the official documentation?

On the left side of the page, you will see the list of subpages too where it explains how to use it with Helm charts (attached a screenshot for reference).

image

I hope this helps. :slight_smile:

Well aware of all the documentation. Been through the documentation multiple times. The Grafana documentation needs to be gone though. It appears things change too fast and the document doesn’t get properly updated. I had to go directly to the CRD’s and Helm chart itself to find the correct values.

I have it deployed and it’s working fine except for scraping new resources. It’s doing great with KSM and node exporter.

Thanks for the feedback @mdiorio

Could you please tell us which part of the documentation is outdated so that we can ask our team to update it?

Will really appreciate your information on this.

My concern right now is getting this monitoring working. Do you happen to have any insight into why it would not be working?

I can see a couple metrics related to this job:
scrape_duration_seconds
scrape_samples_post_metric_relabling
scrape_samples_scraped
scrape_series_added
up

Up is showing the two pods, but both are showing 0.

I got it - it was Rancher’s project level network isolation blocking the traffic. Moved the namespace into the system project that overrides the network isolation and it all started working.

1 Like