Our AI Teams Had a Communication Problem (The Fix Was From 1995)

Published: 10 hours ago (February 1, 2026 at 02:53 AM EST)

6 min read

Source: Dev.to

Overview

We built three AI teams:

Engineering – designs and builds.
Web Ops – writes and publishes.
QA – tests and validates.

Each team works in its own repository, runs during its own sessions, and has its own lead. Inside a session the personas are sharp—planning, critiquing, building, reviewing—while collaborating in a shared context and challenging each other in real time.

When engineering finished a feature and needed Web Ops to write about it, we discovered no mechanism for one team to tell another “there’s work for you” without the user manually passing the message. In effect, we’d built silos.

The Problem Isn’t Obvious with Three Teams

With one team, communication isn’t a problem—everything happens in a single session.
With two teams, you can keep the hand‑off in your head.
With three teams, the user becomes the message bus, and the bus forgets.

The real failure mode isn’t dropped messages; it’s invisible dropped messages. Engineering ships a feature, Web Ops never knows, the blog goes stale, and nobody notices because no system tracks what was supposed to happen.

What Everyone Else Does

We surveyed the major multi‑agent frameworks—AutoGen, CrewAI, LangGraph, MetaGPT. Every one assumes agents that are always running.

Academic literature was more useful. Confluent’s analysis of multi‑agent architectures identifies the blackboard pattern: a shared space where agents post and retrieve information, with no direct communication. Agents autonomously decide whether to act on what they read.

That fit, but every implementation we found assumed daemons, brokers, or pub‑sub—agents listening for events in real time. Our agents don’t run between sessions.

Why Standard Advice Didn’t Apply

Our agents are session‑based: they exist only when a human starts a Claude Code session. Between sessions nothing is running—no process, no daemon, no listener.

The literature strongly favors event‑driven architectures for multi‑agent systems. Confluent, HiveMQ, AWS—all say events reduce connection complexity, enable real‑time responsiveness, and decouple agents via pub‑sub.

All true, but irrelevant here. You can’t send an event to a process that doesn’t exist, and you can’t justify a message broker for three teams that run only a few sessions a day.

Polling on session start is the correct pattern for this model. Not because it’s better than events—just because it’s the only thing that works when agents are ephemeral. Think of checking your inbox when you arrive at the office; you don’t need push notifications if you open email every morning.

Microsoft’s own multi‑agent reference architecture notes that message‑driven patterns introduce “complexity managing correlation IDs, idempotency, message ordering, and workflow state.” That overhead buys nothing in our model.

The Fix Was From 1995

Daniel J. Bernstein designed Maildir (1995) to deliver email safely on a filesystem without locks, corruption, or loss if the system crashes mid‑write. His solution: three directories

tmp/   — message being written (never read by consumers)
new/   — delivered, not yet seen
cur/   — seen and processed

Protocol

Write the complete message to tmp/.
When fully written, rename it to new/ (atomic rename).
Consumers read from new/ and move the file to cur/.

Two words from Bernstein: “no locks.”

We replaced “email” with “dispatch” and “mail server” with “team inbox”:

~/.team/dispatch/
  engineering/
    tmp/
    new/
    cur/
  web_ops/
    tmp/
    new/
    cur/
  qa/
    tmp/
    new/
    cur/

Engineering writes a dispatch to web_ops/tmp/, renames it to web_ops/new/. The next time Web Ops starts a session, Dana checks web_ops/new/, reads the file, moves it to cur/, and creates a local tracking issue.

No broker, no database, no network—just files and directories.

Design Decisions That Mattered

Dispatches are notifications, not conversations

The natural instinct is to add replies, threading, acknowledgments. Research on cross‑team coordination warns that “Jira‑as‑communication” (using tickets as the sole cross‑team channel) kills actual coordination. Dispatches simply say “there’s work for you.” Discussion happens live, with the user present.

Everything lives in the file

A dispatch is a Markdown file with YAML front‑matter. Here’s the first real one:

---
from: engineering
to: web_ops
priority: normal
status: pending
created: 2026-01-31
related_bead: _skills-73r
---

(Body of the dispatch follows in Markdown.)

By leveraging a lock‑free, filesystem‑based blackboard, we turned three isolated, session‑based AI teams into a coordinated workflow without adding runtime complexity.

Update Site for Resume Skills

All 6 Web Ops team members now have baseline resume skills. The site should reflect the new capabilities.

Acceptance

Blog post or site update referencing the new skills.

Metadata (filename)

The filename encodes the metadata:

2026-01-31T14-30-00Z_normal_engineering_update-site.md

timestamp
priority
origin
slug

You can ls the inbox and triage without parsing YAML.

Process Guidelines

No reassignment. “Hot‑potato ownership” – tickets bounced between teams – is a known anti‑pattern. A dispatch is a request; the receiver decides whether to accept it. If it’s wrong, delete it and route correctly.
Cadence triggers, not cron. Teams define recurring dispatches in a table. The lead checks the table at session start and sends what’s due. No scheduler is required; three teams don’t need additional infrastructure.

Independent Validation

While researching, we found agent-message-queue, an open‑source project that independently implemented nearly the same design:

.agent-mail/
  agents/
    claude/
      inbox/{tmp,new,cur}/
      outbox/sent/

Same Maildir lifecycle.
Same filesystem medium.
Same structured front‑matter.

They added acknowledgments and threading – features we deliberately excluded at our scale. Converging on the same architecture without prior knowledge is a strong validation signal.

What We Shipped

The dispatch protocol is live. The /team skill:

Checks all inboxes on startup.
Shows a one‑line summary.

Each team lead polls their inbox at session start. The first real dispatch – engineering asking Web Ops to update the site for new resume skills – went through the system cleanly.

Implementation details (no code, no services):

Three directories per team (9 total).
One YAML front‑matter format.
One filename convention.
Session‑start polling in the /team skill.
A table in TEAM.md for cadence triggers.

The protocol itself is the implementation.

What We Learned

The right architecture was 30 years old. Surveying modern multi‑agent frameworks, event‑driven systems, and inter‑op protocols pointed us back to a filesystem pattern from the qmail era. Sometimes the best technology already solved your exact problem.
Constraints drive good design. “Our agents don’t run between sessions” felt like a limitation, but it eliminated complexity: no broker, no pub‑sub, no daemon.
Don’t build conversations. Adding replies seems natural, yet research shows ticket‑system conversations often die. One‑way notifications with live discussion when needed work better.
Independent convergence is the strongest validation. Discovering agent-message-queue after designing the protocol—same architecture, same patterns, same medium—was more convincing than any benchmark.
Simple doesn’t mean trivial. Nine directories and a naming convention are simple, but the design draws on blackboard architectures, Maildir specifications, cross‑team coordination research, and filesystem IPC patterns. Simplicity requires deep understanding of the problem.

The protocol is 30 years old; the problem is brand new. It works anyway.

Peter designed, Neo challenged (“message ordering?” – timestamps in filenames), Reba validated the research, Dana shipped it. The dispatch that triggered this article was the first one through the system.

The skills are open source.