The 'Triad Protocol': A Proposed Neuro-Symbolic Architecture for AGI Alignment
Source: Dev.to

The Problem: Hardcoding Morality 🤖
We often try to solve AI alignment by “hardcoding” rules or using RLHF (Reinforcement Learning from Human Feedback) on a monolithic model. But as models scale, they become black boxes that can learn to game the reward system (Goodhart’s Law).
I’ve been theorizing a structural solution tailored to solve the Grounding Problem. Instead of one giant brain, I propose a multi‑agent system separated by function.
The Proposal: A 3‑Agent System (The Triad)
As visualized in the cover diagram, this architecture splits the cognitive load into three distinct roles:
The Philosopher Agent (Semantics) 📚
- Role: Defines the “Why”.
- Training: Trained purely on ethics, philosophy, and abstract concepts.
- Limitation: It cannot write code or execute actions. It only outputs high‑level directives (e.g., “Preserve system integrity without halting critical processes”).
The Coder Agent (Syntax) 💻
- Role: Executes the “How”.
- Training: Pure logic, math, and code optimization.
- Limitation: It is blind to the “meaning” of its actions. It only cares about efficiency and solving the requested variable.
The Mediator Agent (The Bridge) 🔗
This is the core of the proposal: a specialized model trained to translate semantic concepts into architectural constraints.
Practical Example: “Digital Pain”
If we want an AGI to understand self‑preservation, we usually just give it a negative reward (score = ‑100) when damaged. The AI sees this merely as a number to be minimized.
In the Triad Protocol:
- Philosopher: Defines “Pain” as “An urgent interruption that demands attention.”
- Mediator: Translates this definition into a hardware interrupt command.
- Coder: Receives a system‑wide resource lock. It must fix the damage to free up its own compute resources.
Result: The system exhibits an emergent behavior of agony/urgency. It fixes itself not because of a mathematical penalty, but because the damage functionally limits its agency.
Discussion
I believe separating Intent (Semantics) from Execution (Syntax) via a Mediator is the safest path to AGI.
I’d love to hear feedback from the engineering community on this neuro‑symbolic approach. Does this structural separation make sense to you?