Kubernetes Networking — Broken Labs & Incident Response

Published: (January 9, 2026 at 05:29 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

When traffic fails, never guess.
Always follow this order:

Ingress

Service

Endpoints

Pod

Container

If one layer fails, everything above it fails.

LAB 1 — ClusterIP Service (Most Common Production Failure)

ClusterIP diagram
Pod‑Service mismatch GIF

Scenario

  • Pods are Running
  • Service exists
  • Browser / curl returns nothing

Broken Setup

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api-v1   # ❌ wrong label
    spec:
      containers:
      - name: app
        image: hashicorp/http-echo:0.2.3
        args:
          - "-listen=:8080"
          - "-text=API OK"
        ports:
        - containerPort: 8080

Service

apiVersion: v1
kind: Service
metadata:
  name: api-svc
spec:
  selector:
    app: api   # ❌ mismatch
  ports:
  - port: 80
    targetPort: 8080

Symptoms

kubectl get pods
kubectl get svc
kubectl get endpoints api-svc

Output

ENDPOINTS: 

Root Cause

Service selector does not match the Pod labels.

Fix

kubectl edit deployment api

Change the pod label to match the service selector:

labels:
  app: api

Verify:

kubectl get endpoints api-svc

DevOps Interview Answer

Q: Service exists but no traffic, pods running. What do you check?
A: Check the Endpoints. Empty endpoints indicate a selector mismatch or readiness problem.

When to Use ClusterIP

  • Internal APIs
  • Backend services
  • Microservices

Pros / Cons

Pros

  • Secure (not exposed externally)
  • Stable IP within the cluster
  • Scales with the number of pods

Cons

  • Accessible only inside the cluster

LAB 2 — NodePort (Why It’s Dangerous)

NodePort diagram
NodePort issue GIF

Scenario

  • NodePort is exposed
  • Works sometimes
  • Fails after a node change

Setup

apiVersion: v1
kind: Service
metadata:
  name: node-svc
spec:
  type: NodePort
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080

Symptoms

  • Works when accessing the service via one node’s IP
  • Fails when using another node’s IP
  • Security team flags the open ports

Root Cause

NodePort opens the same port on every node in the cluster, giving no control over routing and making the service dependent on which node IP you hit.

DevOps Fix

Replace NodePort with one of the following:

  • ClusterIP + Ingress
  • LoadBalancer (if the cloud provider supports it)

Interview Answer

Q: Why is NodePort rarely used in production?
A: It exposes every node, lacks fine‑grained security and routing, and doesn’t scale well.

When NodePort Is Acceptable

  • Debugging / quick tests
  • Temporary external access
  • Learning or sandbox environments

LAB 3 — LoadBalancer Service (Cloud Reality)

LoadBalancer overview
LoadBalancer provisioning GIF

Scenario

  • A LoadBalancer service creates an external IP
  • The application remains unreachable

Setup

apiVersion: v1
kind: Service
metadata:
  name: lb-svc
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

Symptoms

kubectl get svc

Typical output shows an external IP assigned, but curl/browser cannot reach the service.

Common Root Causes

  1. Cloud provider delay – the external load balancer may still be provisioning.
  2. Missing firewall rules – the cloud firewall blocks traffic to the allocated port.
  3. Pod readiness – pods are not ready, so the load balancer has no healthy endpoints.

Fix Checklist

  • Wait for the EXTERNAL-IP column to show a real IP (not “).
  • Verify cloud firewall / security group allows inbound traffic on the service port (usually 80 or 443).
  • Ensure pods are Ready (kubectl get pods) and that the service has endpoints (kubectl get endpoints lb-svc).
  • If using a private VPC, confirm you have a way to reach the IP (VPN, bastion host, etc.).

DevOps Interview Answer

Q: A LoadBalancer service shows an external IP but the app is unreachable. What do you check?
A:

  1. Cloud provider’s load‑balancer provisioning status.
  2. Firewall / security‑group rules.
  3. Pod readiness and endpoint population.

Additional LoadBalancer Troubleshooting

LoadBalancer Issues & Troubleshooting

  • External IP exists
  • Browser timeout

Troubleshooting

kubectl describe svc lb-svc
kubectl get endpoints lb-svc

Check Cloud

  • Health checks
  • Security groups
  • Target port mismatch

Root Cause

Cloud LB health check fails because:

  • Wrong port
  • App not listening
  • Readiness probe failing

DevOps Fix

  • Align ports
  • Add readiness probe
  • Validate security groups

Interview Answer

Q: Why not use LoadBalancer for every service?
A: Cost, lack of routing, and limited flexibility compared to Ingress.

LAB 4 — Ingress (Most Interviewed Topic)

Ingress Overview
Ingress Example

Scenario

  • Ingress created
  • 404 error returned

Broken Ingress Manifest

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  rules:
  - http:
      paths:
      - path: /app
        pathType: Prefix
        backend:
          service:
            name: wrong-svc   # ❌ wrong name
            port:
              number: 80

Symptoms

  • Ingress IP works
  • Always returns 404

Troubleshooting

kubectl describe ingress
kubectl get svc
kubectl get pods -n ingress-nginx

Root Cause

Ingress routes to a non‑existent service.

Fix

Correct the backend service name in the Ingress spec.

Interview Answer

Q: Ingress returns 404, where do you check first?
A: Ingress rules, service name, service port, and controller logs.

LAB 5 — DNS Failure (Hidden Killer)

DNS Failure 1
DNS Failure 2

Scenario

  • Services exist
  • DNS name fails

Test

kubectl run test --rm -it --image=busybox -- sh
nslookup api-svc

Root Cause

  • CoreDNS not running
  • Wrong namespace
  • Service deleted

Fix

kubectl get pods -n kube-system | grep dns

Restart the DNS pods if needed.

Interview Answer

Q: How do Pods discover services?
A: Via Kubernetes DNS, which resolves Service names to their ClusterIP.

INCIDENT RESPONSE PLAYBOOK (Real DevOps)

Step‑by‑Step

  1. Check Ingress
  2. Check Service
  3. Check Endpoints
  4. Check Pod readiness
  5. Check container logs

Never skip steps.

FINAL DECISION MATRIX (Very Important)

RequirementUse
Internal trafficClusterIP
External production trafficIngress
Cloud simple exposureLoadBalancer
Debug onlyNodePort

INTERVIEW RAPID FIRE (Must Memorize)

  • Q: Empty endpoints means?
    A: Selector mismatch or readiness failure.

  • Q: Most used service in prod?
    A: ClusterIP.

  • Q: Why Ingress?
    A: Routing, TLS, cost efficiency.

  • Q: NodePort in prod?
    A: Avoid.

REAL DEVOPS TRUTH

Networking issues are:

  • Predictable
  • Layered
  • Always observable

The difference between struggling and solving fast is methodical thinking.

Back to Blog

Related posts

Read more »

Hello, Newbie Here.

Hi! I'm falling back into the realm of S.T.E.M. I enjoy learning about energy systems, science, technology, engineering, and math as well. One of the projects I...