[Paper] SARC: A Governance-by-Architecture Framework for Agentic AI Systems

Published: (May 8, 2026 at 09:34 AM EDT)
5 min read
Source: arXiv

Source: arXiv - 2605.07728v1

Overview

The paper presents SARC, a runtime governance framework that embeds regulatory constraints directly into the execution loop of tool‑using, “agentic” AI systems. By treating constraints as first‑class objects—on par with state, actions, and rewards—SARC enables real‑time enforcement, auditing, and escalation, closing the gap between policy intent and actual system behavior.

Key Contributions

  • Constraint‑as‑Specification Model – Defines a rich schema (source, class, predicate, verification point, response protocol, operating point) that can be compiled into enforceable hooks.
  • Four Enforcement Hooks – Introduces a Pre‑Action Gate, Action‑Time Monitor, Post‑Action Auditor, and Escalation Router that together guarantee that hard constraints are checked before, during, and after each agent step.
  • Formal Guarantees – Proves minimal invariants needed for a specification to stay in sync with the execution trace, and shows why simple reward penalties cannot replace hard runtime checks.
  • Multi‑Agent Extension – Provides mechanisms for constraint propagation, authority intersection, and trace‑tree attribution so that complex workflows remain auditable end‑to‑end.
  • Prototype & Empirical Evaluation – Implements an audit‑checker and runs a reproducible synthetic benchmark (50 random seeds) on a procurement‑task scenario, demonstrating zero hard‑constraint violations and an 89.5 % reduction in soft‑window overages versus a policy‑as‑code baseline.

Methodology

  1. Specification Language – The authors design a declarative JSON‑like format where each constraint lists:

    • Source: who authored the rule (e.g., regulator, internal policy).
    • Class: hard vs. soft, safety vs. compliance, etc.
    • Predicate: a Boolean condition over the agent’s state or action.
    • Verification Point: when the predicate must be evaluated (pre‑action, during, post‑action).
    • Response Protocol: what to do on violation (reject, throttle, log, escalate).
    • Operating Point: the part of the loop the constraint attaches to.
  2. Compilation to Enforcement Hooks – The specification is automatically transformed into code that inserts the four hooks into the agent’s execution cycle:

    • Pre‑Action Gate blocks disallowed actions before they are sent to a tool.
    • Action‑Time Monitor watches streaming tool outputs for violations that emerge mid‑execution.
    • Post‑Action Auditor validates the final result against any lingering constraints.
    • Escalation Router forwards violations to a higher‑level policy engine or human reviewer.
  3. Formal Invariant Checking – Using trace theory, the authors define spec‑trace correspondence: every observed execution trace must contain a proof that all applicable predicates held at their designated verification points.

  4. Multi‑Agent Workflow Integration – Constraints are propagated along a directed acyclic graph of agents; intersecting authorities are resolved via a priority lattice, and each step records attribution metadata to preserve auditability.

  5. Experimental Setup – A synthetic procurement environment is built where agents request quotes, negotiate, and finalize contracts using external APIs. The authors generate 50 random seeds, inject varying levels of “predicate noise” (e.g., fuzzy predicates) and forced enforcement failures, and compare SARC against four baselines: (i) post‑hoc audit only, (ii) output filtering, (iii) static workflow rules, and (iv) policy‑as‑code without runtime hooks.

Results & Findings

MetricSARCPost‑hoc AuditOutput FilteringWorkflow RulesPolicy‑as‑Code
Hard‑constraint violations (exact predicates)012 % of runs8 %5 %7 %
Soft‑window overages (PAA throttling)89.5 % reduction vs. Policy‑as‑Code45 % reduction60 % reduction70 % reduction
Enforcement‑failure impactViolations scale with enforcement‑stack error (linear)Violations scale with environmental opportunity (super‑linear)
Runtime overhead (average per step)~12 ms (≈3 % of total latency)~5 ms~7 ms~6 ms~4 ms

Key takeaways

  • Zero hard violations when predicates are precise, confirming that the four‑hook architecture reliably blocks illegal actions.
  • Soft‑constraint compliance improves dramatically because the Pre‑Action Gate and Action‑Time Monitor can throttle or reshape behavior before a violation compounds.
  • Error propagation behaves predictably: any missed check is attributable to a specific enforcement layer, simplifying debugging and policy refinement.

Practical Implications

  • Regulated AI Deployments – Companies building autonomous agents for finance, procurement, or healthcare can embed SARC to satisfy compliance auditors in‑flight rather than relying on after‑the‑fact reports.
  • Tool‑Use Safety Nets – Developers integrating LLM‑driven agents with external APIs (e.g., code execution, web browsing) can define “no‑network‑outside‑whitelist” or “budget‑cap” constraints that are enforced before the request ever leaves the sandbox.
  • Observability & Auditing – The built‑in trace attribution enables automated audit logs that map each decision back to the originating policy, reducing manual forensic effort during investigations.
  • Policy‑as‑Code Evolution – SARC’s declarative spec can be version‑controlled alongside code, allowing CI pipelines to validate that new policies compile without breaking existing enforcement hooks.
  • Multi‑Agent Orchestration – In complex pipelines (e.g., a chain of LLMs, planners, and executors), SARC’s propagation and authority‑intersection mechanisms ensure that a single high‑level compliance rule is respected throughout the entire workflow.

Limitations & Future Work

  • Synthetic Evaluation – The experiments use a controlled procurement sandbox; real‑world deployments may expose edge cases (network latency, nondeterministic tool responses) not captured here.
  • Predicate Noise Sensitivity – While the authors explore fuzzy predicates, the framework still depends on well‑specified, decidable conditions; ambiguous legal language could lead to over‑conservative blocking.
  • Scalability of Enforcement Stack – Adding many constraints increases the number of checks per step; future work should explore adaptive batching or hardware‑accelerated verification.
  • Human‑in‑the‑Loop Escalation – The Escalation Router currently forwards to a generic policy engine; integrating nuanced human decision‑making (e.g., risk‑based triage) remains an open challenge.
  • Formal Verification Integration – Extending SARC to interoperate with theorem provers or model‑checkers could provide stronger guarantees for safety‑critical domains.

Authors

  • Gaston Besanson

Paper Information

  • arXiv ID: 2605.07728v1
  • Categories: cs.SE, cs.CY
  • Published: May 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »