Fixing Prometheus namespace monitoring

Published: 1 week ago (December 17, 2025 at 03:20 PM EST)

2 min read

Source: Dev.to

Source: Dev.to

Cover image for Fixing Prometheus namespace monitoring

The Setup

I’m running a Kubernetes cluster with Prometheus Operator and a pretty standard discovery pattern:

ServiceMonitor objects
Namespace‑based scrape selection
Grafana dashboards and Alertmanager rules downstream

The key piece is that Prometheus only scrapes namespaces with a specific label.

Here’s a simplified version of the ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-servicemonitor
spec:
  selector:
    matchLabels:
      app: payments-api
  namespaceSelector:
    matchLabels:
      monitoring: enabled
  endpoints:
    - port: metrics
      interval: 30s

At this point, namespace labels are effectively part of the monitoring config.

Breaking It

For the exercise, I removed the monitoring=enabled label from the namespace.

Nothing crashed.
Pods kept running.
Metrics quietly disappeared.

Exactly the kind of failure that’s easy to miss.

Detecting What’s Wrong

First thing I checked was whether Prometheus itself was unhealthy:

kubectl get pods -n monitoring
kubectl logs prometheus-k8s-0 -n monitoring

Everything looked fine.

Next, I checked whether Prometheus was scraping anything from the namespace:

up{namespace="payments-prod"}

No results. That tells me Prometheus isn’t scraping targets — not that the app is down.

Finding the Cause

Next step was checking the namespace itself:

kubectl get namespace payments-prod --show-labels

Output looked like this:

monitoring=disabled

Since the ServiceMonitor relies on:

namespaceSelector:
  matchLabels:
    monitoring: enabled

Prometheus was doing exactly what it was configured to do. From Prometheus’ perspective, nothing was broken.

Fixing It

Restoring the label was enough:

kubectl label namespace payments-prod monitoring=enabled --overwrite

Metrics came back within a scrape interval.

Quick check to confirm freshness:

time() - timestamp(up{namespace="payments-prod"})

Everything was visible again.

Why This Is a Sneaky Failure

The interesting part of this exercise:

No alerts fired.
Dashboards were empty, not red.
Prometheus treats missing data as “nothing to evaluate”.

If something real had broken during this window, I wouldn’t have known. This is how you end up trusting “green” systems that you’re actually blind to.

Hardening Things Afterward

Alerting on Missing Metrics

I added an explicit alert to catch telemetry loss:

- alert: NamespaceMetricsMissing
  expr: absent(up{namespace="payments-prod"})
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "No metrics from payments-prod"
    description: "Prometheus is scraping zero targets in this namespace."

Fixing Prometheus namespace monitoring

The Setup

Breaking It

Detecting What’s Wrong

Finding the Cause

Fixing It

Why This Is a Sneaky Failure

Hardening Things Afterward

Alerting on Missing Metrics

Related posts

Replacing Phone Addiction with Building a Real Project

A Definitive Guide to Warehouse Utilisation

CinemaSins: Everything Wrong With Red One In 18 Minutes Or Less

Ingesting 100M Heartbeats: Scaling Wearable Tech Without Going Broke