[Paper] Web World Models

Published: 3 days ago (December 29, 2025 at 01:31 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.23676v1

Overview

The paper introduces Web World Models (WWMs) – a hybrid approach that blends the reliability of traditional web back‑ends with the creative flexibility of large language models (LLMs). By encoding the “physics” of a virtual world in ordinary web code (databases, APIs, typed interfaces) and letting LLMs generate narratives and high‑level decisions, the authors demonstrate a scalable way to build persistent, controllable, yet open‑ended environments for language agents.

Key Contributions

Middle‑ground architecture: Combines deterministic web‑stack logic with generative LLM output, avoiding the brittleness of pure rule‑based worlds and the chaos of fully generative simulations.
Typed latent state: Defines world state as explicit web interfaces (REST endpoints, GraphQL schemas, etc.), enabling type‑safe interaction between code and language models.
Deterministic generation pipeline: Uses LLMs to produce structured, repeatable content (e.g., map tiles, story arcs) that can be cached and queried like any other web resource.
Diverse prototype suite: Implements four WWMs ranging from an infinite, geography‑grounded travel atlas to a sci‑fi galaxy explorer, an encyclopedic knowledge base, and a game‑like simulation.
Design guidelines: Distills practical principles—rule separation, typed state, deterministic generation—that can be adopted by developers building their own agent‑centric worlds.

Methodology

World State as Web Services
- The authors model every entity (locations, characters, items) as resources exposed via standard web APIs (REST/GraphQL).
- Business logic (movement rules, inventory constraints, physics) lives in server‑side code (Python/Node.js) and a backing database, guaranteeing consistency.
LLM‑Driven Narrative Layer
- An LLM receives a prompt that includes the current typed state (e.g., JSON snapshot of the agent’s location) and a high‑level goal.
- The model returns a structured response (action intent + narrative text). The intent is parsed and routed to the web API, which updates the latent state.
Deterministic Generation
- To keep the world “infinite” yet reproducible, the system seeds the LLM with a deterministic hash derived from the requested location or story node.
- The same seed always yields the same generated description, allowing caching and offline replay.
Prototype Construction
- Four domains were built on a common stack (Dockerized services, PostgreSQL, FastAPI, OpenAI’s GPT‑4).
- Each prototype showcases a different balance of open‑endedness vs. rule enforcement (e.g., the atlas respects real‑world geography, while the galaxy explorer follows fictional physics).

Results & Findings

Consistency: Across 10,000 simulated agent steps, rule violations (e.g., moving through impassable terrain) dropped to <0.1% thanks to the code‑enforced physics layer.
Scalability: The deterministic generation approach allowed the infinite atlas to serve >1 M unique location requests without noticeable latency (average 120 ms).
Agent Performance: Language agents equipped with WWMs completed navigation and information‑retrieval tasks 30‑45% faster than agents using purely generative worlds, owing to the reliable state queries.
Developer Feedback: Early adopters reported that the typed API contracts dramatically reduced debugging time when integrating new LLM prompts.

Practical Implications

Rapid Prototyping of Virtual Assistants: Companies can spin up “knowledge worlds” (e.g., product catalogs, internal docs) where an LLM can safely query and augment information without risking hallucinations.
Game Development: Indie studios can leverage WWMs to create procedurally generated maps that still obey game rules (collision, resource limits), cutting down on hand‑crafted level design.
Simulation‑as‑a‑Service: Enterprises needing sandbox environments for training autonomous agents (e.g., logistics bots) can host a web‑based world that guarantees safety constraints while still offering rich, varied scenarios.
Interoperability: Because the world state is exposed via standard web APIs, existing tooling (Swagger, Postman, CI pipelines) can be reused, lowering the barrier for integration with CI/CD and monitoring stacks.

Limitations & Future Work

LLM Dependence: The quality of narratives and decision suggestions still hinges on the underlying language model; biased or low‑quality outputs can propagate into the world.
State Explosion: While deterministic generation mitigates storage costs, extremely large worlds may still require sophisticated caching and sharding strategies.
Limited Real‑Time Dynamics: The current prototypes focus on turn‑based updates; extending WWMs to high‑frequency, real‑time simulations (e.g., multiplayer games) remains an open challenge.
Future Directions: The authors plan to explore hierarchical world composition (nesting WWMs), integrate reinforcement‑learning agents that can modify the rule layer, and evaluate cross‑modal extensions (e.g., visual rendering tied to the web state).

Authors

Jichen Feng
Yifan Zhang
Chenggong Zhang
Yifu Lu
Shilong Liu
Mengdi Wang

Paper Information

arXiv ID: 2512.23676v1
Categories: cs.AI, cs.CL, cs.CV
Published: December 29, 2025
PDF: Download PDF

[Paper] Web World Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

[Paper] Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs

[Paper] Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems

[Paper] SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time