AI is a Non-Deterministic Guest in a Deterministic House: Stop Building Chatbots, Start Building Sandboxes

Published: 18 hours ago (May 3, 2026 at 09:16 PM EDT)

5 min read

Source: Dev.to

Cover image for AI is a Non-Deterministic Guest in a Deterministic House: Stop Building Chatbots, Start Building Sandboxes

The Signal: The Legally Binding Hallucination

Recently, a major airline’s customer‑support chatbot hallucinated a bereavement‑fare policy. A customer claimed the refund, the airline refused, and a tribunal ruled in favor of the customer. The chatbot was deemed a legal agent of the company.

The failure wasn’t that the LLM hallucinated—it’s that it was allowed to speak directly to the customer and the database without a chaperone. When you give a non‑deterministic guest unregulated access to your deterministic house, you are legally and financially responsible for the fire.

We need to stop treating AI as an open‑ended “chat” interface and start treating it as untrusted, highly volatile code execution.

Phase 1: The Architectural Bet

We are shifting from Open Dialogue to Hardened State‑Machine Confinement.

The Vendor Trap – the Chat Completion API.
It encourages you to build open text boxes where users ask for anything and the AI returns anything. It relies on system prompts to enforce behavior—like asking a burglar to please lock the door on their way out.
The Ownership Path – the Isolate Sandbox.
We don’t want a conversationalist; we want a function that takes inputs, runs in a cryptographically and memory‑hardened environment, and outputs a strictly typed payload that we validate before it ever touches our main thread.

Phase 2: The Security Audit (Why your current sandbox is a liability)

Last week I proposed using the native Node.js vm module to sandbox agent outputs. Our Lead QA and Security Tester ripped the pull request to shreds. Below is the audit report that forced an architectural rewrite:

Senior Tester Audit Report

Severity	Issue	Details
CRITICAL	Sandbox Escape	The native Node.js `vm` module is not a security boundary. The official docs explicitly state: “Do not use it to run untrusted code.” An LLM can easily hallucinate a Prototype Pollution attack, traverse the prototype chain, and execute Remote Code Execution (RCE) on the host machine.
CRITICAL	Event Loop DoS	`vm.runInContext` runs on the main thread. If the LLM generates a simple `while(true){}` loop, it will block the Node.js event loop entirely. Your server will instantly drop all active user connections.
HIGH	State Corruption	Passing live objects (e.g., a DB connection) into the VM context allows the agent to mutate them globally.

Verdict: We cannot use native Node.js tools. We must drop down to the C++ V8 engine level.

Phase 3: The Production Implementation (V8 Isolates)

To build a true “Boss Battle” arena we use isolated‑vm. This creates a completely separate instance of the V8 JavaScript engine with its own memory heap. If the AI triggers an infinite loop or tries to break out, we can snipe the isolate thread without affecting the main Node.js server.

const ivm = require('isolated-vm');
const { trace } = require('@opentelemetry/api');

const tracer = trace.getTracer('ai.hardened_sandbox');

class FortressSandbox {
  constructor(memoryLimitMB = 64, timeoutMs = 1500) {
    this.memoryLimitMB = memoryLimitMB;
    this.timeoutMs = timeoutMs;
  }

  async executeUntrustedAgent(aiGeneratedLogic, safeInputPayload) {
    return tracer.startActiveSpan('v8_isolate_execution', async (span) => {
      // 1. Hard Boundary – create a separate V8 heap
      const isolate = new ivm.Isolate({ memoryLimit: this.memoryLimitMB });
      const context = isolate.createContextSync();
      const jail = context.global;

      try {
        // 2. State Management – pass data as deeply cloned strings, NEVER by reference
        jail.setSync('global', jail.derefInto());
        jail.setSync('_inputData', JSON.stringify(safeInputPayload));

        // 3. Compile the Agent's logic
        const script = isolate.compileScriptSync(`
          // Agent must parse input, do its logic, and return a stringified result
          const input = JSON.parse(_inputData);
          let output = {};

          ${aiGeneratedLogic}

          JSON.stringify(output);
        `);

        // 4. Dead‑Man’s Switch – run with strict timeout
        // If it loops infinitely, the isolate is terminated. Main thread survives.
        const resultStr = script.runSync(context, { timeout: this.timeoutMs });

        span.setAttribute('sandbox.status', 'success');
        return JSON.parse(resultStr);
      } catch (error) {
        span.recordException(error);
        span.setAttribute('sandbox.status', 'terminated');
        // The guest tried to burn the house down. The house won.
        return {
          error: `GUARD INTERVENTION: Agent execution terminated. Reason: ${error.message}`
        };
      } finally {
        // 5. Memory Cleanup – destroy the arena
        isolate.dispose();
        span.end();
      }
    });
  }
}

// Example Usage:
// const fortress = new FortressSandbox();
// const output = await fortress.executeUntrustedAgent(
//   "output.action = 'refund'; output.amount = input.amount;",
//   { amount: 500 }
// );

End of cleaned markdown.

Phase 4: Checklist (What to Build Next)

Implement Zod Egress Filtering
The output of FortressSandbox is secure from a code‑execution standpoint, but the data is still untrusted. Pipe the output directly into a Zod schema validator. If it fails, drop the request.
Tail‑Based OTel Sampling
Sandboxes will fail often (by design). Configure your OpenTelemetry collector to only save the full trace spans for sandbox.status === 'terminated' to save on Datadog/Honeycomb costs.
Multi‑Agent Firebreaks
If Agent A passes data to Agent B, it must pass through a schema check in between. Never let two agents share the same V8 isolate memory space.

The Bottom Line:
Treat LLM outputs like user input from the public internet in 1999. Sanitize them, isolate them, and assume they are malicious by default. Build the house. Contain the guest.

AI is a Non-Deterministic Guest in a Deterministic House: Stop Building Chatbots, Start Building Sandboxes

The Signal: The Legally Binding Hallucination

Phase 1: The Architectural Bet

Phase 2: The Security Audit (Why your current sandbox is a liability)

Senior Tester Audit Report

Phase 3: The Production Implementation (V8 Isolates)

Phase 4: Checklist (What to Build Next)

Related posts

The Folder Structure That Makes Client Handoffs Painless

Retrospective: Migrating from Nginx to Kong 3.0 Improved API Observability 40%

I Built a Real-Time Voice AI in 50 Minutes. Here's How (and Why)

'Why I stopped trusting npm audit (and built my own)'

The Signal: The Legally Binding Hallucination

Phase 1: The Architectural Bet

Phase 2: The Security Audit (Why your current sandbox is a liability)

Senior Tester Audit Report

Phase 3: The Production Implementation (V8 Isolates)

Phase 4: Checklist (What to Build Next)

Related posts

The Folder Structure That Makes Client Handoffs Painless

Retrospective: Migrating from Nginx to Kong 3.0 Improved API Observability 40%

I Built a Real-Time Voice AI in 50 Minutes. Here's How (and Why)

'Why I stopped trusting npm audit (and built my own)'

Phase 1: The Architectural Bet

Phase 2: The Security Audit (Why your current sandbox is a liability)

Phase 3: The Production Implementation (V8 Isolates)

Phase 4: Checklist (What to Build Next)