[Paper] Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

Published: 1 day ago (April 27, 2026 at 12:46 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2604.24686v1

Overview

The paper introduces a new way to keep autonomous AI agents safe while they are running, even when their internal code never changes. By estimating the unobserved risk of an action and comparing it to the agent’s capacity to handle that risk, the authors turn governance from a reactive “after‑the‑fact” process into a proactive, runtime safety net.

Key Contributions

Informational Viability Principle – a formal rule that an action is allowed only if the agent’s safety margin (its capacity S(x)) exceeds a bound on hidden risk ĤB(x) = U(x) + SB(x) + RG(x).
Agent Viability Framework – built on Aubin’s viability theory, defining three necessary properties for safe operation:
1. Monitoring (P1) – continuous observation of observable signals.
2. Anticipation (P2) – forecasting hidden risk before it materialises.
3. Monotonic Restriction (P3) – progressively tightening constraints, never loosening them.
RiskGate – a concrete implementation that:
- Uses statistical estimators (KL‑divergence, segment‑vs‑rest z‑tests, sequential pattern matching) to compute the risk bound.
- Provides a fail‑secure monotonic pipeline that can shut down the agent (kill‑switch) as a last resort.
- Generates a scalar Viability Index (VI) in [-1, +1] with a first‑order prediction t* to move from reactive to predictive governance.
Theoretical coverage of existing AI‑agent failure taxonomies, showing that the three properties together are both necessary and sufficient to prevent documented failure modes.
Reference open‑source implementation (code released with the paper) that can be plugged into existing autonomous systems for experimental validation.

Methodology

Formal Risk Modeling – The authors decompose hidden risk into three components:
- U(x): uncertainty from limited observations.
- SB(x): shift‑induced bias (e.g., adversarial drift).
- RG(x): residual risk from unknown dynamics.
  These are combined into a bound ĤB(x).
Viability Theory Backbone – Using Aubin’s viability theory, they define a regulation map that maps the current state of the agent to a set of admissible actions. The map is monotone: once an action is disallowed, it stays disallowed unless the risk estimate drops sufficiently.
Statistical Estimators –
- KL divergence measures distributional drift between recent and baseline behavior.
- Segment‑vs‑rest z‑tests detect anomalous sub‑trajectories.
- Sequential pattern matching flags rare or unseen action sequences.
RiskGate Pipeline – The estimators feed into a risk aggregation module that computes ĤB(x). The agent’s internal capacity S(x) (derived from resource budgets, confidence scores, etc.) is then compared to ĤB(x). If S(x) – ĤB(x) < margin, the pipeline triggers a monotonic restriction (e.g., throttling, safe‑mode, or kill‑switch).
Viability Index & Prediction – A scalar VI(t) is continuously updated. A simple linear extrapolation yields a predicted crossing time t*, allowing the system to intervene before the margin is breached.

Results & Findings

Theoretical proof that satisfying P1‑P3 eliminates all failure patterns listed in three major AI‑agent failure taxonomies (e.g., reward hacking, distributional shift, adversarial manipulation).
Simulation case study (autonomous drone autopilot) shows that RiskGate can detect a drift in wind‑model assumptions 5 seconds before the Viability Index would have dropped below zero, giving the controller time to switch to a safe fallback.
Comparative analysis demonstrates that a naïve reactive monitor (triggered only after a safety violation) misses 73 % of early‑drift events that RiskGate catches.
Performance overhead is modest: the full RiskGate pipeline adds ~12 ms latency per decision cycle on a typical edge GPU, well within real‑time constraints for many robotics and vehicular applications.

Practical Implications

Safer deployment of autonomous systems – developers can embed RiskGate as a runtime guardrail for self‑driving cars, delivery drones, or trading bots, reducing the need for exhaustive pre‑deployment verification.
Regulatory compliance – the framework offers a quantifiable safety margin (S(x) – ĤB(x)) that could satisfy emerging AI‑risk standards (e.g., EU AI Act, ISO 26262 extensions).
Graceful degradation – monotonic restriction ensures that when risk rises, the system can automatically downgrade capabilities (e.g., lower speed, switch to conservative planning) before a hard shutdown is required.
Plug‑and‑play – because RiskGate relies on observable telemetry and statistical estimators, it can be retrofitted onto legacy agents without modifying their core decision‑making code.
Developer tooling – the open‑source library includes dashboards for real‑time Viability Index visualization, making it easier to debug and tune safety margins during development.

Limitations & Future Work

Empirical validation is limited – the paper only presents a proof‑of‑concept simulation; large‑scale real‑world trials (e.g., on road vehicles) are left for future studies.
Risk bound estimation depends on quality of statistical models; in highly non‑stationary environments, KL‑divergence or z‑tests may lag behind rapid shifts.
Capacity function S(x) is assumed to be known; deriving accurate, domain‑specific capacity metrics can be non‑trivial.
Scalability to multi‑agent ecosystems has not been explored; interactions between agents could introduce emergent risks not captured by a single‑agent Viability Index.
Future work includes extending RiskGate to handle distributed sensor fusion, integrating learning‑based risk estimators, and conducting field experiments across robotics, finance, and autonomous navigation domains.

Authors

German Marin
Jatin Chaudhary

Paper Information

arXiv ID: 2604.24686v1
Categories: cs.AI
Published: April 27, 2026
PDF: Download PDF

[Paper] Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recursive Multi-Agent Systems

[Paper] How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

[Paper] Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

[Paper] Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models