Up query returns identical result for different instances

Hello!

I am monitoring the status (up or down) of my services using the default up metric. I understand the value of this metric is 1 if the last scrape was successful, else it is 0. My metrics are available at my-domain.com/service-name/prometheus, which is only down when the whole service is down.

I have the following queries:

up{instance="my-dev-domain.com", service_name=~"*service"}
up{instance="my-prod-domain.com", service_name=~"*service"}

The names of the services all end in service.

Problem

When I bring down any service in the dev environment, the second query also returns 0 for that service. This is an issue because it triggers alerting and a change in my Grafana visuals, even though the service was never down in prod to begin with (I would have noticed that). I don’t understand why Prometheus evaluates the second query to be 0, when the instance label is clearly different.

What I expected to happen

The second query should be independent from the first one and not return 0 for a given service when the first one does.

What I tried

I did a sanity check of the DNS records of the two domains and they are set up correctly. So Prometheus should be scraping different data for the two environments. I also tried adding a new label, environment=dev or environment=prod to each target, hoping that would create less ambiguity between queries (even though instance was already there), but it didn’t make a difference. It seems to me that Prometheus is mistaking the two jobs for each other somehow.

Configuration

I deploy Prometheus to my Kubernetes cluster using Helm. Here’s a snippet from my config, using bit of relabeling magic to keep the config DRY, else I would have to repeat the targets for each service:

  prometheus.yml:
    scrape_configs:
    - job_name: dev-services
      scheme: https
      scrape_interval: 15s
      basic_auth:
        username_file: /etc/prometheus/secrets/basicauth/username
        password_file: /etc/prometheus/secrets/basicauth/password
      static_configs:
      - targets:
        - name-of-my-first-service
        - name-of-my-second-service
      relabel_configs:
      - source_labels: [ __address__ ]
        target_label: service_name
      - source_labels: [ __address__ ]
        target_label: __metrics_path__
        replacement: /service/$1/prometheus
      - target_label: __address__
        replacement: my-dev-domain.com

    - job_name: prod-services
      scheme: https
      scrape_interval: 15s
      basic_auth:
        username_file: /etc/prometheus/secrets/basicauth/username
        password_file: /etc/prometheus/secrets/basicauth/password
      static_configs:
      - targets:
        - name-of-my-first-service
        - name-of-my-second-service
      relabel_configs:
      - source_labels: [ __address__ ]
        target_label: service_name
      - source_labels: [ __address__ ]
        target_label: __metrics_path__
        replacement: /service/$1/prometheus
      - target_label: __address__
        replacement: my-prod-domain.com

I am running Prometheus 3.5.0. Thanks and let me know if something’s unclear. I’m bit of in the dark here, because I don’t understand what’s happening, so I hope I provided enough details.