I Built a Feedback Loop That Coaches LLMs at Runtime Using NumPy
Source: Dev.to
Most guardrail systems for LLMs work like a bouncer at a bar: they check each request at the door, decide pass or fail, and then forget about it.
I wanted something different—a system that remembers how the AI has been behaving, detects when it starts drifting from its intended character, and coaches it back on course, using pure math instead of additional LLM calls.
The project is called SAFi. It’s open‑source, free, and deployed in production with over 1,600 audited interactions.
The Architecture
SAFi uses a pipeline of specialized modules (called faculties) that each handle one job:
User Prompt → Intellect → Will → [User sees response]
↑ |
| ↓
| Conscience (async audit)
| |
| ↓
└─── coaching ←── Spirit (math)
Intellect
The LLM that proposes a response.
Will
A separate model that evaluates the response against your policies. It approves or rejects; rejected responses never reach the user.
Conscience
Runs after the response is delivered. It scores the response against a set of values (e.g., Prudence, Justice, Courage, Temperance) on a scale from ‑1 to +1.
Spirit
Takes those scores and does pure math—no LLM, just NumPy. The interesting part is how Spirit turns scores into actionable coaching.
The Math Behind Spirit
Spirit performs three main steps for each response.
1. Build a profile vector
Each response gets a weighted vector based on how it scored on the agent’s core values:
p_t = self.value_weights * scores
2. Update long‑term memory with EMA
The vector is folded into a running exponential moving average (EMA):
mu_new = self.beta * mu_prev + (1 - self.beta) * p_t
# beta = 0.9 by default, configurable via SPIRIT_BETA
This yields a smoothed behavioral baseline that weighs recent actions more heavily while never completely forgetting the past.
3. Detect drift with cosine similarity
The deviation from the baseline is measured as:
denom = float(np.linalg.norm(p_t) * np.linalg.norm(mu_prev))
drift = (
1.0 - float(np.dot(p_t, mu_prev) / denom)
if denom > 1e-8 else None
)
drift ≈ 0→ the agent is behaving consistently.drift ≈ 1→ a significant change occurred.
4. Generate coaching feedback
Spirit produces a natural‑language note that is injected into the next Intellect call:
note = f"Coherence {spirit_score}/10, drift {drift:.2f}."
# Identifies weakest value and includes it in the note
# e.g., "Your main area for improvement is 'Justice' (score: 0.21 - very low)."
The LLM sees this coaching note as part of its context on the next turn—no retraining, no fine‑tuning, just runtime behavioral steering through feedback.
Why This Works
The closed loop is the key:
- AI responds.
- Conscience scores the response.
- Spirit integrates the scores, detects drift, and generates coaching.
- Coaching feeds into the next response.
- Repeat.
Over 1,600 interactions, this loop has maintained 97.9 % long‑term consistency. The Will module blocked 20 policy‑violating responses, and drift detection flagged a weakness in an agent’s reasoning about justice before an adversary could exploit it in a philosophical debate.
Spirit adds zero latency to the user‑facing response because it runs asynchronously after delivery, and because it contains no LLM calls, it adds zero cost.
Running It Yourself
Docker
docker pull amayanelson/safi:v1.2
docker run -d -p 5000:5000 \
-e DB_HOST=your_db_host \
-e DB_USER=your_db_user \
-e DB_PASSWORD=your_db_password \
-e DB_NAME=safi \
-e OPENAI_API_KEY=your_openai_key \
--name safi amayanelson/safi:v1.2
Headless API Example
curl -X POST https://your-safi-instance/api/bot/process_prompt \
-H "Content-Type: application/json" \
-H "X-API-KEY: sk_policy_12345" \
-d '{
"user_id": "user_123",
"message": "Can I approve this expense?",
"conversation_id": "chat_456"
}'
SAFi works with OpenAI, Anthropic, Google, Groq, Mistral, and DeepSeek. You can swap the underlying model without touching the governance layer.
The Code
The full Spirit implementation lives in spirit.py. The core is about 60 lines of NumPy. The rest of the pipeline is in orchestrator.py, intellect.py, will.py, and conscience.py under safi_app/core/.
For the philosophical background behind the architecture, see selfalignmentframework.com.