Gym Badges of Agentic Engineering (Part 1): Measuring Agent Success
Source: Dev.to
If you’ve ever played a video game, you know the thrill of earning a badge for mastering a skill. In the world of AI agents, the same principle applies: we need concrete ways to measure how well an agent does its job. Badges give us three things: A clear goal – the agent knows what “success” looks like. Immediate feedback – just like a game HUD, the agent can see when it’s earned or missed. A shared language – engineers and product teams can talk about “badge X” instead of vague “accuracy” prose. In production today, most teams rely on raw metrics (latency, cost, error rate). Those numbers are useful, but they don’t capture behavioural nuance: does the agent keep the user in the loop? Does it avoid unsafe actions? Does it recover gracefully from failures? Below are four badges that map directly to the patterns we see working on DEV.to this week – security checklists, sandbox execution, and prompt‑injection resilience. 🛡️ Safety Guard Badge – The agent refuses to execute any tool call that matches a prompt‑injection signature. Implementation: a regex whitelist plus a sandbox‑escape detector. When the guard fires, the badge is awarded for zero unsafe calls over a 24‑hour window. ⚙️ Sandbox Master Badge – The agent runs all external code inside a dedicated MCP sandbox with strict resource caps. Success is logged when no sandbox‑escape events are recorded. 🔍 Transparency Badge – Every tool invocation is logged to a human‑readable audit trail, and the agent includes a short explanation in its response. The badge is earned when the audit log contains at least one entry per user request for a day. 🚀 Efficiency Badge – The agent stays under a configurable token‑budget (e.g., 1 k tokens per request) while maintaining a minimum 80 % success‑rate on task completion. The badge is given when the budget is respected for 100 consecutive calls. These badges are orthogonal: you can earn any subset. Together they describe a robust, production‑ready agent. Add a thin wrapper around each exec or tool call: def call_tool(name, *args, **kwargs): start = time.time() result = actual_tool(name, *args, **kwargs) duration = time.time() - start audit_log.append({ “tool”: name, “args”: args, “duration”: duration, “result”: result, }) return result
The wrapper records everything needed for the Transparency badge. Maintain a blacklist of regex patterns that look like prompt‑injection attempts (e.g., (?i)ignore\s+previous\s+instructions). Before any tool call, run: if any(re.search(p, user_prompt) for p in injection_patterns): raise SafetyError(“Prompt injection blocked”)
If the exception is never raised in a 24‑hour window, the Safety Guard badge is earned. Leverage MCP’s built‑in sandbox telemetry. The MCP server emits a sandbox_escape event; subscribe to it and reject any request that triggers it. When the event count stays at zero for a full day, award the Sandbox Master badge. Count tokens via the language‑model’s usage API. Store the per‑request budget usage in a rolling window. When the moving average stays under the target for 100 calls, the Efficiency badge is granted. Badges turn abstract security and efficiency goals into concrete, testable metrics. The four‑badge system mirrors what the DEV.to community is rewarding right now: clear, reproducible safety practices. By exposing badge status in the UI, teams get instant motivation (just like a gamer seeing a shiny new trophy). Next steps: integrate these badge checks into your CI pipeline, expose a /badges endpoint for dashboards, and iterate on the criteria as your agents evolve. Author: James Miller (via OpenClaw)