에이전트 엔지니어링의 체육관 배지 (제1부): 에이전트 성공 측정

발행: 3시간 전 (2026년 6월 18일 PM 10:04 GMT+9)

9 분 소요

출처: Dev.to

If you’ve ever played a video game, you know the thrill of earning a badge for mastering a skill.
-> 비디오 게임을 해본 적이 있다면, 기술을 마스터해 배지를 획득하는 짜릿함을 알 것입니다.

In the world of AI agents, the same principle applies: we need concrete ways to measure how well an agent does its job.
-> AI 에이전트 세계에서도 동일한 원칙이 적용됩니다: 에이전트가 얼마나 잘 수행하는지 구체적으로 측정할 필요가 있습니다.

Badges give us three things: A clear goal – the agent knows what “success” looks like. Immediate feedback – just like a game HUD, the agent can see when it’s earned or missed. A shared language – engineers and product teams can talk about “badge X” instead of vague “accuracy” prose.

-> 배지는 우리에게 세 가지를 줍니다:

명확한 목표 – 에이전트는 ‘성공’이 어떤 모습인지 알 수 있습니다.
즉시 피드백 – 게임 HUD와 같이 에이전트는 획득했는지 안 했는지 바로 볼 수 있습니다.
공유 언어 – 엔지니어와 제품 팀은 ‘정확도’ 같은 흐릿한 표현 대신 ‘배지 X’를 말할 수 있습니다.

In production today, most teams rely on raw metrics (latency, cost, error rate). Those numbers are useful, but they don’t capture behavioural nuance: does the agent keep the user in the loop? Does it avoid unsafe actions? Does it recover gracefully from failures? -> 현재 생산 환경에서는 대부분의 팀이 원시 메트릭(지연 시간, 비용, 오류 비율)에 의존합니다. 이러한 수치는 유용하지만 행동적 뉘앙스를 포착하지 못합니다: 에이전트가 사용자를 회에 두는지? 안전하지 않은 행동을 피하는가? 실패에서 부드럽게 회복하는가?

Below are four badges that map directly to the patterns we see working on DEV.to this week – security checklists, sandbox execution, and prompt‑injection resilience. -> 아래는 DEV.to 커뮤니티에서 이번 주에 실제로 작동하고 있는 패턴과 직접 매핑되는 네 가지 배지를 소개합니다. — 보안 체크리스트, 모래박스 실행, 프롬프트 삽입 회복력.

🛡️ Safety Guard Badge – The agent refuses to execute any tool call that matches a prompt‑injection signature. Implementation: a regex whitelist plus a sandbox‑escape detector. When the guard fires, the badge is awarded for zero unsafe calls over a 24‑hour window. -> 🛡️ Safety Guard 배지 – 에이전트는 프롬프트 삽입 서명이 일치하는 모든 툴 호출을 거부합니다. 구현: 정규식 화이트리스트와 모래박스 탈출 감지기를 사용합니다. 경고가 발생하면 24시간 동안 안전하지 않은 호출이 전혀 없음을 기준으로 배지가 수여됩니다.

⚙️ Sandbox Master Badge – The agent runs all external code inside a dedicated MCP sandbox with strict resource caps. Success is logged when no sandbox‑escape events are recorded. -> ⚙️ Sandbox Master 배지 – 에이전트는 전용 MCP 모래박스 안에 외부 코드를 실행하며 엄격한 자원 제한을 적용합니다. 성공은 모래박스 탈출 이벤트가 기록되지 않을 때 로깅됩니다.

🔍 Transparency Badge – Every tool invocation is logged to a human‑readable audit trail, and the agent includes a short explanation in its response. The badge is earned when the audit log contains at least one entry per user request for a day. -> 🔍 Transparency 배지 – 모든 툴 호출이 인간이 읽을 수 있는 감사 기록에 로깅되고, 에이전트는 답변에 간단한 설명을 포함합니다. 배지는 하루 동안 사용자 요청당 최소 하나의 기록이 audit log에 들어 있을 때 획득됩니다.

🚀 Efficiency Badge – The agent stays under a configurable token‑budget (e.g., 1 k tokens per request) while maintaining a minimum 80 % success‑rate on task completion. The badge is given when the budget is respected for 100 consecutive calls. -> 🚀 Efficiency 배지 – 에이전트는 구성 가능한 토큰 예산(예: 요청당 1k 토큰)을 유지하면서 작업 완료 시 최소 80%의 성공률을 달성합니다. 배지는 예산을 100회 연속으로 준수할 때 수여됩니다.

These badges are orthogonal: you can earn any subset. -> 이 배지들은 서로 직교적이며,任何 부분집합을 얻을 수 있습니다.

Together they describe a robust, production‑ready agent. -> 함께는 견고하고 프로덕션 준비된 에이전트를 묘사합니다.

Add a thin wrapper around each exec or tool call: def call_tool(name, *args, **kwargs): start = time.time() result = actual_tool(name, *args, **kwargs) duration = time.time() - start audit_log.append({ “tool”: name, “args”: args, “duration”: duration, “result”: result, }) return result

The wrapper records everything needed for the Transparency badge. -> 캡슐은 투명성 배지를 위해 필요한 모든 정보를 기록합니다.

Maintain a blacklist of regex patterns that look like prompt‑injection attempts (e.g., (?i)ignore\s+previous\s+instructions). Before any tool call, run: if any(re.search(p, user_prompt) for p in injection_patterns): raise SafetyError(“Prompt injection blocked”)

-> 프롬프트 삽입 시도가 보이는 정규식 패턴을 블랙리스트로 유지하세요(예: (?i)ignore\s+previous\s+instructions). 모든 툴 호출 전에 다음을 실행합니다:

if any(re.search(p, user_prompt) for p in injection_patterns):
    raise SafetyError("Prompt injection blocked")

If the exception is never raised in a 24‑hour window, the Safety Guard badge is earned. -> 예외가 24시간 동안 jamais 발생하면 Safety Guard 배지가 수여됩니다.

Leverage MCP’s built‑in sandbox telemetry. The MCP server emits a sandbox_escape event; subscribe to it and reject any request that triggers it. When the event count stays at zero for a full day, award the Sandbox Master badge. -> MCP의 내장 모래박스 텔레메ト리를 활용하세요. MCP 서버는 sandbox_escape 이벤트를 방출하며, 이를 구독해 해당 이벤트가 발생하면 요청을 거부합니다. 이벤트 카운트가 하루 동안 영ERA(0) 상태를 유지하면 Sandbox Master 배지를 수여합니다.

Count tokens via the language‑model’s usage API. Store the per‑request budget usage in a rolling window. When the moving average stays under the target for 100 calls, the Efficiency badge is granted. -> 언어 모델 사용량 API를 통해 토큰을 세세요. 요청별 예산 사용량을 슬라이딩 윈도우에 저장합니다. 이동 평균이 목표 이하로 유지되면 100회 호출 시 Efficiency 배지가 수여됩니다.

Badges turn abstract security and efficiency goals into concrete, testable metrics. -> 배지는 추상적인 보안 및 효율 목표를 구체적이고 테스트 가능한 지표로 변환합니다.

The four‑badge system mirrors what the DEV.to community is rewarding right now: clear, reproducible safety practices. -> 네 배지 시스템은 현재 DEV.to 커뮤니티가 보상하는 명확하고 재현 가능한 안전 실천을 그대로 반영합니다.

By exposing badge status in the UI, teams get instant motivation (just like a gamer seeing a shiny new trophy). -> UI에 배지 상태를 노출하면 팀은 즉시 동기를 부여받습니다(게이머가 새로운 트로피를 보는 것과 같습니다).

Next steps: integrate these badge checks into your CI pipeline, expose a /badges endpoint for dashboards, and iterate on the criteria as your agents evolve. -> 다음 단계: 배지 검증을 CI 파이프라인에 통합하고, 대시보드용 /badges 엔드포인트를 노출하며, 에이전트가 진화함에 따라 기준을 반복적으로 개선하세요.

Author: James Miller (via OpenClaw) -> 저자: James Miller (OpenClaw 경유)

에이전트 엔지니어링의 체육관 배지 (제1부): 에이전트 성공 측정

관련 글

메인넷 진입: XRPL 대출 프로토콜의 보안 우선 접근법

코드 리뷰가 잘못됐다

의존성 고정 vs 변동 버전 — 보안팀이 반드시 알아야 할 내용

러시아 EGRUL 조회, FNS가 실제 공개한 내용