[Paper] Trusted AI Agents in the Cloud

Published: 2 months ago (December 5, 2025 at 01:48 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.05951v1

Overview

The paper introduces Omega, a platform that lets cloud providers run large‑language‑model (LLM)‑powered AI agents in a way that is provably secure and auditable. By extending confidential virtual machines (CVMs) and confidential GPUs, Omega guarantees that agents can’t leak data, be tampered with, or act outside of defined policies—even when multiple untrusted parties share the same hardware.

Key Contributions

End‑to‑end isolation for AI agents – combines AMD SEV‑SNP confidential VMs with NVIDIA H100 confidential GPUs to protect both CPU and accelerator state.
Nested multi‑agent sandboxing – hosts many agents inside a single CVM while keeping each agent’s memory and GPU context isolated.
Differential attestation – a lightweight protocol that lets each principal (e.g., a data owner, a tool provider, or a cloud operator) verify that the part of the system it cares about is running the exact code it expects.
Policy‑driven supervision – a declarative language for specifying data‑access, tool‑use, and inter‑agent communication rules; the runtime enforces these policies and logs provenance for every external call.
High‑density, cloud‑scale deployment – demonstrates that dozens of agents can share a single confidential GPU with only modest overhead, making the approach economically viable.

Methodology

Trusted Execution Foundations – The authors start from AMD’s SEV‑SNP (which encrypts VM memory and provides hardware‑rooted attestation) and NVIDIA’s confidential GPU feature (which encrypts GPU memory and isolates kernels).
Nested Isolation Layer – Inside the CVM, a lightweight hypervisor creates “agent containers” that each get its own virtual address space and a dedicated GPU context. The hypervisor enforces strict separation using page‑table isolation and GPU memory partitioning.
Differential Attestation Protocol – When an agent is launched, the platform produces a cryptographic proof that includes the hash of the agent’s code, the policy bundle, and the specific hardware enclave being used. Each stakeholder can verify only the portion relevant to them, avoiding a single monolithic attestation that would expose other parties’ secrets.
Policy Engine & Provenance Logging – A policy interpreter runs inside the CVM, intercepting all I/O (file reads, network calls, GPU kernel launches). Before an operation proceeds, the engine checks the policy and, if allowed, records a signed log entry that ties the action to the specific agent and data item.
Evaluation Setup – The prototype runs on a server equipped with an AMD EPYC processor (SEV‑SNP) and an NVIDIA H100 GPU (confidential mode). Benchmarks include LLM inference (e.g., Llama‑2‑70B), tool‑calling workloads, and multi‑agent orchestration scenarios.

Results & Findings

Metric	Baseline (plain CVM)	Omega (confidential + policy)
LLM inference latency (per token)	1.8 ms	2.1 ms (+17 %)
GPU utilization overhead (due to isolation)	–	5 %
Number of agents per GPU (memory‑bound)	4	12
Policy enforcement latency (per I/O)	N/A	0.3 ms
Provenance log size (per 1 k ops)	N/A	12 KB

Performance impact is modest: Adding full confidentiality and policy checks adds under 20 % latency to LLM inference, which is acceptable for many SaaS use‑cases.
Scalability: The nested sandboxing lets three‑times more agents share the same GPU compared to naïve per‑agent VM isolation, thanks to efficient memory partitioning.
Security guarantees: Formal analysis (in the paper’s appendix) shows that any data‑exfiltration or unauthorized tool use would be detected and blocked by the policy engine, and the attestation logs provide cryptographic evidence for auditors.

Practical Implications

Secure AI‑as‑a‑Service: Cloud providers can now offer “trusted AI agents” that customers can run on shared hardware without fearing data leakage to other tenants or the provider itself.
Regulatory compliance: The policy language can encode GDPR, HIPAA, or industry‑specific constraints (e.g., “no export of PHI outside EU”), and the immutable provenance logs satisfy audit requirements.
Marketplace for AI tools: Third‑party tool vendors can expose their utilities (e.g., code‑generation plugins) to agents while retaining control—Omega’s differential attestation proves to the vendor that only authorized agents are invoking their code.
Cost‑effective deployment: By packing many agents onto a single confidential GPU, operators can achieve higher utilization rates, lowering the price point for secure AI services.
Developer workflow: Teams can write agents in familiar languages (Python, Rust) and simply package a policy file; the Omega runtime handles the heavy lifting of isolation and attestation.

Limitations & Future Work

Hardware dependency: Omega relies on AMD SEV‑SNP and NVIDIA H100 confidential GPUs; adoption is limited to clouds that expose these exact features.
Policy expressiveness vs. performance: Very complex policies (e.g., deep data‑flow constraints) can increase enforcement latency; the authors plan to explore policy compilation techniques to mitigate this.
Side‑channel considerations: While memory and GPU state are encrypted, the paper acknowledges that micro‑architectural side‑channels (e.g., cache timing) are not fully addressed and will be a focus of subsequent research.
Dynamic code loading: Current implementation assumes agents are static binaries; supporting just‑in‑time model updates or plug‑in loading will require extensions to the attestation protocol.

Omega pushes the boundary of what’s possible for trustworthy AI services in the cloud, offering a pragmatic blend of hardware‑rooted security, fine‑grained policy control, and performance that could make “confidential AI agents” a mainstream offering in the next wave of cloud AI platforms.

Authors

Teofil Bodea
Masanori Misono
Julian Pritzi
Patrick Sabanic
Thore Sommer
Harshavardhan Unnibhavi
David Schall
Nuno Santos
Dimitrios Stavrakakis
Pramod Bhatotia

Paper Information

arXiv ID: 2512.05951v1
Categories: cs.CR, cs.AI, cs.MA
Published: December 5, 2025
PDF: Download PDF

[Paper] Trusted AI Agents in the Cloud

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

[Paper] Training-Time Action Conditioning for Efficient Real-Time Chunking

[Paper] Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity

[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement