Kubernetes Networking — Broken Labs & Incident Response

Published: 0 month ago (January 9, 2026 at 05:29 PM EST)

4 min read

Source: Dev.to

When traffic fails, never guess.
Always follow this order:

Ingress
↓
Service
↓
Endpoints
↓
Pod
↓
Container

If one layer fails, everything above it fails.

LAB 1 — ClusterIP Service (Most Common Production Failure)

ClusterIP diagram
Pod‑Service mismatch GIF

Scenario

Pods are Running
Service exists
Browser / curl returns nothing

Broken Setup

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api-v1   # ❌ wrong label
    spec:
      containers:
      - name: app
        image: hashicorp/http-echo:0.2.3
        args:
          - "-listen=:8080"
          - "-text=API OK"
        ports:
        - containerPort: 8080

Service

apiVersion: v1
kind: Service
metadata:
  name: api-svc
spec:
  selector:
    app: api   # ❌ mismatch
  ports:
  - port: 80
    targetPort: 8080

Symptoms

kubectl get pods
kubectl get svc
kubectl get endpoints api-svc

Output

ENDPOINTS:

Root Cause

Service selector does not match the Pod labels.

Fix

kubectl edit deployment api

Change the pod label to match the service selector:

labels:
  app: api

Verify:

kubectl get endpoints api-svc

DevOps Interview Answer

Q: Service exists but no traffic, pods running. What do you check?
A: Check the Endpoints. Empty endpoints indicate a selector mismatch or readiness problem.

When to Use ClusterIP

Internal APIs
Backend services
Microservices

Pros / Cons

Pros

Secure (not exposed externally)
Stable IP within the cluster
Scales with the number of pods

Cons

Accessible only inside the cluster

LAB 2 — NodePort (Why It’s Dangerous)

NodePort diagram
NodePort issue GIF

Scenario

NodePort is exposed
Works sometimes
Fails after a node change

Setup

apiVersion: v1
kind: Service
metadata:
  name: node-svc
spec:
  type: NodePort
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080

Symptoms

Works when accessing the service via one node’s IP
Fails when using another node’s IP
Security team flags the open ports

Root Cause

NodePort opens the same port on every node in the cluster, giving no control over routing and making the service dependent on which node IP you hit.

DevOps Fix

Replace NodePort with one of the following:

ClusterIP + Ingress
LoadBalancer (if the cloud provider supports it)

Interview Answer

Q: Why is NodePort rarely used in production?
A: It exposes every node, lacks fine‑grained security and routing, and doesn’t scale well.

When NodePort Is Acceptable

Debugging / quick tests
Temporary external access
Learning or sandbox environments

LAB 3 — LoadBalancer Service (Cloud Reality)

LoadBalancer overview
LoadBalancer provisioning GIF

Scenario

A LoadBalancer service creates an external IP
The application remains unreachable

Setup

apiVersion: v1
kind: Service
metadata:
  name: lb-svc
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

Symptoms

kubectl get svc

Typical output shows an external IP assigned, but curl/browser cannot reach the service.

Common Root Causes

Cloud provider delay – the external load balancer may still be provisioning.
Missing firewall rules – the cloud firewall blocks traffic to the allocated port.
Pod readiness – pods are not ready, so the load balancer has no healthy endpoints.

Fix Checklist

Wait for the EXTERNAL-IP column to show a real IP (not “).
Verify cloud firewall / security group allows inbound traffic on the service port (usually 80 or 443).
Ensure pods are Ready (kubectl get pods) and that the service has endpoints (kubectl get endpoints lb-svc).
If using a private VPC, confirm you have a way to reach the IP (VPN, bastion host, etc.).

DevOps Interview Answer

Q: A LoadBalancer service shows an external IP but the app is unreachable. What do you check?
A:

Cloud provider’s load‑balancer provisioning status.
Firewall / security‑group rules.
Pod readiness and endpoint population.

Additional LoadBalancer Troubleshooting

LoadBalancer Issues & Troubleshooting

External IP exists
Browser timeout

Troubleshooting

kubectl describe svc lb-svc
kubectl get endpoints lb-svc

Check Cloud

Health checks
Security groups
Target port mismatch

Root Cause

Cloud LB health check fails because:

Wrong port
App not listening
Readiness probe failing

DevOps Fix

Align ports
Add readiness probe
Validate security groups

Interview Answer

Q: Why not use LoadBalancer for every service?
A: Cost, lack of routing, and limited flexibility compared to Ingress.

LAB 4 — Ingress (Most Interviewed Topic)

Ingress Overview
Ingress Example

Scenario

Ingress created
404 error returned

Broken Ingress Manifest

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  rules:
  - http:
      paths:
      - path: /app
        pathType: Prefix
        backend:
          service:
            name: wrong-svc   # ❌ wrong name
            port:
              number: 80

Symptoms

Ingress IP works
Always returns 404

Troubleshooting

kubectl describe ingress
kubectl get svc
kubectl get pods -n ingress-nginx

Root Cause

Ingress routes to a non‑existent service.

Fix

Correct the backend service name in the Ingress spec.

Interview Answer

Q: Ingress returns 404, where do you check first?
A: Ingress rules, service name, service port, and controller logs.

LAB 5 — DNS Failure (Hidden Killer)

DNS Failure 1
DNS Failure 2

Scenario

Services exist
DNS name fails

Test

kubectl run test --rm -it --image=busybox -- sh
nslookup api-svc

Root Cause

CoreDNS not running
Wrong namespace
Service deleted

Fix

kubectl get pods -n kube-system | grep dns

Restart the DNS pods if needed.

Interview Answer

Q: How do Pods discover services?
A: Via Kubernetes DNS, which resolves Service names to their ClusterIP.

INCIDENT RESPONSE PLAYBOOK (Real DevOps)

Step‑by‑Step

Check Ingress
Check Service
Check Endpoints
Check Pod readiness
Check container logs

Never skip steps.

FINAL DECISION MATRIX (Very Important)

Requirement	Use
Internal traffic	`ClusterIP`
External production traffic	`Ingress`
Cloud simple exposure	`LoadBalancer`
Debug only	`NodePort`

INTERVIEW RAPID FIRE (Must Memorize)

Q: Empty endpoints means?
A: Selector mismatch or readiness failure.
Q: Most used service in prod?
A: ClusterIP.
Q: Why Ingress?
A: Routing, TLS, cost efficiency.
Q: NodePort in prod?
A: Avoid.

REAL DEVOPS TRUTH

Networking issues are:

Predictable
Layered
Always observable

The difference between struggling and solving fast is methodical thinking.

LAB 1 — ClusterIP Service (Most Common Production Failure)

Scenario

Broken Setup

Deployment

Service

Symptoms

Root Cause

Fix

DevOps Interview Answer

When to Use ClusterIP

Pros / Cons

LAB 2 — NodePort (Why It’s Dangerous)

Scenario

Setup

Symptoms

Root Cause

DevOps Fix

Interview Answer

When NodePort Is Acceptable

LAB 3 — LoadBalancer Service (Cloud Reality)

Scenario

Setup

Symptoms

Common Root Causes

Fix Checklist

DevOps Interview Answer

Additional LoadBalancer Troubleshooting

LoadBalancer Issues & Troubleshooting

Troubleshooting

Check Cloud

Root Cause

DevOps Fix

Interview Answer

LAB 4 — Ingress (Most Interviewed Topic)

Scenario

Broken Ingress Manifest

Symptoms

Troubleshooting

Root Cause

Fix

Interview Answer

LAB 5 — DNS Failure (Hidden Killer)

Scenario

Test

Root Cause

Fix

Interview Answer

INCIDENT RESPONSE PLAYBOOK (Real DevOps)

Step‑by‑Step

FINAL DECISION MATRIX (Very Important)

INTERVIEW RAPID FIRE (Must Memorize)

REAL DEVOPS TRUTH

Related posts

The Agent Control Plane: Why Intelligence Without Governance Is a Bug

Your 'Atomic' Deploys Probably Aren't Atomic

It's Time to Learn about Google TPUs in 2026

Hello, Newbie Here.

LAB 1 — ClusterIP Service (Most Common Production Failure)

LAB 2 — NodePort (Why It’s Dangerous)

LAB 3 — LoadBalancer Service (Cloud Reality)

LAB 4 — Ingress (Most Interviewed Topic)

LAB 5 — DNS Failure (Hidden Killer)