[Paper] SARC: A Governance-by-Architecture Framework for Agentic AI Systems

Published: 3 days ago (May 8, 2026 at 09:34 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.07728v1

Overview

The paper presents SARC, a runtime governance framework that embeds regulatory constraints directly into the execution loop of tool‑using, “agentic” AI systems. By treating constraints as first‑class objects—on par with state, actions, and rewards—SARC enables real‑time enforcement, auditing, and escalation, closing the gap between policy intent and actual system behavior.

Key Contributions

Constraint‑as‑Specification Model – Defines a rich schema (source, class, predicate, verification point, response protocol, operating point) that can be compiled into enforceable hooks.
Four Enforcement Hooks – Introduces a Pre‑Action Gate, Action‑Time Monitor, Post‑Action Auditor, and Escalation Router that together guarantee that hard constraints are checked before, during, and after each agent step.
Formal Guarantees – Proves minimal invariants needed for a specification to stay in sync with the execution trace, and shows why simple reward penalties cannot replace hard runtime checks.
Multi‑Agent Extension – Provides mechanisms for constraint propagation, authority intersection, and trace‑tree attribution so that complex workflows remain auditable end‑to‑end.
Prototype & Empirical Evaluation – Implements an audit‑checker and runs a reproducible synthetic benchmark (50 random seeds) on a procurement‑task scenario, demonstrating zero hard‑constraint violations and an 89.5 % reduction in soft‑window overages versus a policy‑as‑code baseline.

Methodology

Specification Language – The authors design a declarative JSON‑like format where each constraint lists:
- Source: who authored the rule (e.g., regulator, internal policy).
- Class: hard vs. soft, safety vs. compliance, etc.
- Predicate: a Boolean condition over the agent’s state or action.
- Verification Point: when the predicate must be evaluated (pre‑action, during, post‑action).
- Response Protocol: what to do on violation (reject, throttle, log, escalate).
- Operating Point: the part of the loop the constraint attaches to.
Compilation to Enforcement Hooks – The specification is automatically transformed into code that inserts the four hooks into the agent’s execution cycle:
- Pre‑Action Gate blocks disallowed actions before they are sent to a tool.
- Action‑Time Monitor watches streaming tool outputs for violations that emerge mid‑execution.
- Post‑Action Auditor validates the final result against any lingering constraints.
- Escalation Router forwards violations to a higher‑level policy engine or human reviewer.
Formal Invariant Checking – Using trace theory, the authors define spec‑trace correspondence: every observed execution trace must contain a proof that all applicable predicates held at their designated verification points.
Multi‑Agent Workflow Integration – Constraints are propagated along a directed acyclic graph of agents; intersecting authorities are resolved via a priority lattice, and each step records attribution metadata to preserve auditability.
Experimental Setup – A synthetic procurement environment is built where agents request quotes, negotiate, and finalize contracts using external APIs. The authors generate 50 random seeds, inject varying levels of “predicate noise” (e.g., fuzzy predicates) and forced enforcement failures, and compare SARC against four baselines: (i) post‑hoc audit only, (ii) output filtering, (iii) static workflow rules, and (iv) policy‑as‑code without runtime hooks.

Results & Findings

Metric	SARC	Post‑hoc Audit	Output Filtering	Workflow Rules	Policy‑as‑Code
Hard‑constraint violations (exact predicates)	0	12 % of runs	8 %	5 %	7 %
Soft‑window overages (PAA throttling)	89.5 % reduction vs. Policy‑as‑Code	45 % reduction	60 % reduction	70 % reduction	—
Enforcement‑failure impact	Violations scale with enforcement‑stack error (linear)	Violations scale with environmental opportunity (super‑linear)	—	—	—
Runtime overhead (average per step)	~12 ms (≈3 % of total latency)	~5 ms	~7 ms	~6 ms	~4 ms

Key takeaways

Zero hard violations when predicates are precise, confirming that the four‑hook architecture reliably blocks illegal actions.
Soft‑constraint compliance improves dramatically because the Pre‑Action Gate and Action‑Time Monitor can throttle or reshape behavior before a violation compounds.
Error propagation behaves predictably: any missed check is attributable to a specific enforcement layer, simplifying debugging and policy refinement.

Practical Implications

Regulated AI Deployments – Companies building autonomous agents for finance, procurement, or healthcare can embed SARC to satisfy compliance auditors in‑flight rather than relying on after‑the‑fact reports.
Tool‑Use Safety Nets – Developers integrating LLM‑driven agents with external APIs (e.g., code execution, web browsing) can define “no‑network‑outside‑whitelist” or “budget‑cap” constraints that are enforced before the request ever leaves the sandbox.
Observability & Auditing – The built‑in trace attribution enables automated audit logs that map each decision back to the originating policy, reducing manual forensic effort during investigations.
Policy‑as‑Code Evolution – SARC’s declarative spec can be version‑controlled alongside code, allowing CI pipelines to validate that new policies compile without breaking existing enforcement hooks.
Multi‑Agent Orchestration – In complex pipelines (e.g., a chain of LLMs, planners, and executors), SARC’s propagation and authority‑intersection mechanisms ensure that a single high‑level compliance rule is respected throughout the entire workflow.

Limitations & Future Work

Synthetic Evaluation – The experiments use a controlled procurement sandbox; real‑world deployments may expose edge cases (network latency, nondeterministic tool responses) not captured here.
Predicate Noise Sensitivity – While the authors explore fuzzy predicates, the framework still depends on well‑specified, decidable conditions; ambiguous legal language could lead to over‑conservative blocking.
Scalability of Enforcement Stack – Adding many constraints increases the number of checks per step; future work should explore adaptive batching or hardware‑accelerated verification.
Human‑in‑the‑Loop Escalation – The Escalation Router currently forwards to a generic policy engine; integrating nuanced human decision‑making (e.g., risk‑based triage) remains an open challenge.
Formal Verification Integration – Extending SARC to interoperate with theorem provers or model‑checkers could provide stronger guarantees for safety‑critical domains.

Authors

Gaston Besanson

Paper Information

arXiv ID: 2605.07728v1
Categories: cs.SE, cs.CY
Published: May 8, 2026
PDF: Download PDF

[Paper] SARC: A Governance-by-Architecture Framework for Agentic AI Systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Collaborator or Assistnat? How AI Coding Agents Partition Work Across Pull Request Lifecycles

[Paper] Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization

[Paper] Evaluating Design Conformance Through Trace Comparison

[Paper] Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem