Your AI Agents Deserve the Same Ops Treatment as Your Microservices

Published: 1 month ago (March 13, 2026 at 09:18 AM EDT)

5 min read

Source: Dev.to

Source: Dev.to

A Few Months Ago…

I was looking at how our team was actually running AI agents in production:

One was a Python script in a tmux session on someone’s laptop.
Another was a cron job with no timeout.
A third had no cost limits – it quietly burned through $800 in API calls over a weekend because it got stuck in a loop.

None of this would fly for a micro‑service. We’d never ship a service with no health checks, no resource limits, and no way to roll back a bad deploy. But agents were getting a free pass because they felt different somehow. They’re AI, not “real” infrastructure.

I don’t think that’s a good enough reason.

The Thing Is, Agents Are Just Workloads

Strip away the LLM part and an agent is a long‑running process that:

Consumes resources
Has a health state
Needs to scale
Requires configuration management

That’s just a service. Kubernetes already knows how to manage services.

The missing piece was a way to tell Kubernetes what an agent is — not in terms of CPU and memory, but in terms of model, system prompt, and tool access.

So I built a Kubernetes operator that does exactly that: agentops-operator.

What It Actually Looks Like

You define an agent the same way you define a Deployment:

apiVersion: agentops.agentops.io/v1alpha1
kind: AgentDeployment
metadata:
  name: research-agent
spec:
  replicas: 3
  model: claude-sonnet-4-20250514
  systemPrompt: |
    You are a research agent. Gather and summarise information
    accurately. Always cite your sources.
  limits:
    maxTokensPerCall: 8000
    maxConcurrentTasks: 5
    timeoutSeconds: 120

kubectl apply -f research-agent.yaml
kubectl get agdep
# NAME           MODEL                      REPLICAS   READY   AGE
# research-agent claude-sonnet-4-20250514   3          3       45s

Three agent pods, managed by Kubernetes. Scale to 10:

kubectl patch agdep research-agent --type=merge -p '{"spec":{"replicas":10}}'

GitOps, RBAC, namespaces, kubectl… all work without modification because agents are now just Kubernetes resources.

The Part I’m Most Proud Of: Semantic Health Checks

Standard liveness probes check if a process is responding to HTTP. That’s fine for a web server, but an LLM can be “alive” while spitting out complete nonsense.

agentops-operator adds a semantic probe type — a secondary LLM call that validates whether the agent is actually working:

livenessProbe:
  type: semantic
  intervalSeconds: 60
  validatorPrompt: "Reply with exactly one word: HEALTHY"

If the agent fails that check, the pod is pulled from routing until it recovers. It’s the same semantics as any other failing health check, except the health check understands what “healthy” means for an LLM.

Token Limits You Can’t Accidentally Delete

This was the $800 problem. The fix: limits live in the infrastructure, not in application code.

limits:
  maxTokensPerCall: 8000
  maxConcurrentTasks: 5
  timeoutSeconds: 120

The operator injects these as environment variables into every agent pod it creates. A developer can’t remove them by editing the wrong file, and a mis‑configured prompt can’t cause an infinite loop that drains your credit card.

Rolling Back a Bad System Prompt

Change a prompt → open a PR → merge → kubectl apply.
Roll back → git revert → kubectl apply.

The full history of who changed what prompt and when lives in Git, just like any other infrastructure change. This sounds obvious until you’ve had to figure out why an agent started behaving differently last Tuesday and nobody can remember what changed.

Multi‑Agent Pipelines Without Glue Code

You can chain agents together declaratively:

apiVersion: agentops.agentops.io/v1alpha1
kind: AgentPipeline
metadata:
  name: research-then-summarize
spec:
  input:
    topic: "AI in healthcare"
  steps:
    - name: research
      agentDeployment: research-agent
      inputs:
        prompt: "Research this topic: {{ .pipeline.input.topic }}"
    - name: summarize
      agentDeployment: summarizer-agent
      dependsOn: [research]
      inputs:
        prompt: "Summarize these findings: {{ .steps.research.output }}"
  output: "{{ .steps.summarize.output }}"

The operator handles the queue, waits for each step to complete, passes the output to the next step, and updates the pipeline status. Watching kubectl get agpipe -w go from Running to Succeeded while two separate LLMs do their thing is a bit surreal.

Try It

Prerequisites: Docker, kind, kubectl, Go 1.25+

git clone https://github.com/agentops-io/agentops-operator.git
cd agentops-operator
make dev ANTHROPIC_API_KEY=sk-ant-...

That one command:

Creates a kind cluster
Builds both Docker images
Deploys Redis and the operator inside the cluster
Sets up the API‑key secret

When it finishes, deploy an agent:

kubectl apply -f config/samples/agentops_v1alpha1_agentdeployment.yaml
kubectl get agdep -w

Submit a Task

# Add a task to the Redis stream
kubectl exec -it -n agent-infra redis-0 -- \
  redis-cli XADD agent-tasks '*' prompt "What is the capital of France? One sentence."

# Read the result
kubectl exec -it -n agent-infra redis-0 -- \
  redis-cli XREAD COUNT 10 STREAMS agent-tasks-results 0
# => "The capital of France is Paris."

Enjoy turning AI agents into first‑class Kubernetes resources!

Honest Caveats

This is v0.0.1, single contributor, early alpha. Some things that aren’t done yet:

Parallel pipeline steps – right now steps run sequentially even if they don’t depend on each other.
KEDA autoscaling on queue depth – CPU is the wrong signal for agent workloads; queue depth is right, but not implemented yet.
Only Anthropic for now – the provider interface exists for OpenAI/Gemini, but there are no implementations yet.

I’m writing this because I think the problem is real and worth solving, not because the solution is finished.

If any of this sounds familiar—agents running in tmux sessions, no cost controls, prompt changes deployed by SSHing into a box—I’d genuinely like to hear how you’re handling it. And if you want to contribute, see CONTRIBUTING.md for the setup.

GitHub:
Docs:

Your AI Agents Deserve the Same Ops Treatment as Your Microservices

A Few Months Ago…

The Thing Is, Agents Are Just Workloads

What It Actually Looks Like

The Part I’m Most Proud Of: Semantic Health Checks

Token Limits You Can’t Accidentally Delete

Rolling Back a Bad System Prompt

Multi‑Agent Pipelines Without Glue Code

Try It

Submit a Task

Honest Caveats

Related posts

Why Open Source AI Tools Are Quietly Winning

Travigo

Trust Debt: The Production Crisis Hidden Inside AI-Generated Codebases

Micro games