Fixing Prometheus namespace monitoring
Source: Dev.to

The Setup
I’m running a Kubernetes cluster with Prometheus Operator and a pretty standard discovery pattern:
ServiceMonitorobjects- Namespace‑based scrape selection
- Grafana dashboards and Alertmanager rules downstream
The key piece is that Prometheus only scrapes namespaces with a specific label.
Here’s a simplified version of the ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-servicemonitor
spec:
selector:
matchLabels:
app: payments-api
namespaceSelector:
matchLabels:
monitoring: enabled
endpoints:
- port: metrics
interval: 30s
At this point, namespace labels are effectively part of the monitoring config.
Breaking It
For the exercise, I removed the monitoring=enabled label from the namespace.
- Nothing crashed.
- Pods kept running.
- Metrics quietly disappeared.
Exactly the kind of failure that’s easy to miss.
Detecting What’s Wrong
First thing I checked was whether Prometheus itself was unhealthy:
kubectl get pods -n monitoring
kubectl logs prometheus-k8s-0 -n monitoring
Everything looked fine.
Next, I checked whether Prometheus was scraping anything from the namespace:
up{namespace="payments-prod"}
No results. That tells me Prometheus isn’t scraping targets — not that the app is down.
Finding the Cause
Next step was checking the namespace itself:
kubectl get namespace payments-prod --show-labels
Output looked like this:
monitoring=disabled
Since the ServiceMonitor relies on:
namespaceSelector:
matchLabels:
monitoring: enabled
Prometheus was doing exactly what it was configured to do. From Prometheus’ perspective, nothing was broken.
Fixing It
Restoring the label was enough:
kubectl label namespace payments-prod monitoring=enabled --overwrite
Metrics came back within a scrape interval.
Quick check to confirm freshness:
time() - timestamp(up{namespace="payments-prod"})
Everything was visible again.
Why This Is a Sneaky Failure
The interesting part of this exercise:
- No alerts fired.
- Dashboards were empty, not red.
- Prometheus treats missing data as “nothing to evaluate”.
If something real had broken during this window, I wouldn’t have known. This is how you end up trusting “green” systems that you’re actually blind to.
Hardening Things Afterward
Alerting on Missing Metrics
I added an explicit alert to catch telemetry loss:
- alert: NamespaceMetricsMissing
expr: absent(up{namespace="payments-prod"})
for: 5m
labels:
severity: critical
annotations:
summary: "No metrics from payments-prod"
description: "Prometheus is scraping zero targets in this namespace."