[Paper] PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement

Published: 3 days ago (February 16, 2026 at 12:55 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.14968v1

Overview

The paper introduces PhyScensis, a novel framework that lets large‑language‑model (LLM) agents design richly detailed 3‑D scenes while guaranteeing that the resulting arrangements obey real‑world physics. By tightly coupling an LLM‑driven planner with a physics engine, the system can automatically generate complex tabletop, shelf, or packing scenarios that are both visually plausible and physically stable—an essential capability for scaling robot simulation pipelines.

Key Contributions

Physics‑augmented LLM agent that iteratively proposes objects together with spatial and physical predicates (e.g., “book A rests on shelf S”).
Solver‑feedback loop: a physics engine validates the predicates, resolves collisions, and returns stability metrics that guide the LLM to refine the layout.
Probabilistic programming layer for fine‑grained control over numeric parameters (exact positions, contact forces) while preserving stochastic diversity.
Joint stability‑spatial heuristic that balances physical feasibility with compact, high‑density arrangements, enabling scenes with dozens of interacting items.
Comprehensive evaluation showing superior scene complexity, visual fidelity, and physical correctness compared with prior 3‑D layout generators.

Methodology

Prompt & Goal Specification – The user supplies a high‑level textual description (e.g., “organize a bookshelf with 30 books of varying sizes”).
LLM Agent Planning – The LLM generates a sequence of asset‑predicate statements, each describing an object and its intended relationship (support, containment, contact).
Physics‑Enabled Solver – A lightweight physics engine (e.g., PyBullet) takes the predicates, places the objects, and runs a short simulation to check for interpenetrations and stability.
Feedback & Refinement – The solver returns a stability score and any violation details. The LLM uses this feedback to revise predicates, add missing supports, or adjust positions.
Probabilistic Programming Wrapper – Numerical attributes (exact coordinates, orientation) are sampled from learned distributions conditioned on the LLM’s textual output, allowing controlled randomness and reproducibility.
Iterative Convergence – The loop repeats until the scene meets predefined thresholds for stability and spatial compactness, at which point the final 3‑D scene is exported for simulation or rendering.

Results & Findings

Complexity: PhyScensis generated scenes with up to 70 objects (e.g., books, mugs, boxes) on a single shelf, far exceeding the 15‑20 object limit typical of prior methods.
Physical Accuracy: In a benchmark of 500 generated layouts, 92 % remained stable after a 5‑second physics simulation, compared to 68 % for the strongest baseline.
Visual Quality: Human evaluators rated PhyScensis layouts as 4.3/5 on realism, versus 3.1/5 for non‑physics‑aware generators.
Speed: The iterative loop converged in an average of 3.2 iterations, taking roughly 1.8 s per scene on a single GPU, making it practical for large‑scale data generation.

Practical Implications

Robotics Simulation: Researchers can automatically spin up thousands of physically plausible manipulation scenarios (e.g., pick‑and‑place, packing) without hand‑crafting each environment, accelerating data collection for reinforcement learning and imitation learning.
Synthetic Dataset Creation: Vision‑and‑physics datasets (e.g., for affordance detection or stability prediction) can be generated at scale with accurate ground‑truth contact and support labels.
Game & AR/VR Content: Designers can use natural‑language prompts to populate interiors or puzzle rooms that behave correctly under physics, reducing manual layout time.
Human‑Robot Interaction: Service robots can be pre‑trained on a wide variety of shelf‑stocking or tabletop‑arrangement tasks, improving transfer to real‑world deployments.

Limitations & Future Work

Physics Engine Fidelity: The current solver uses simplified rigid‑body dynamics; deformable objects or fluid interactions remain out of scope.
LLM Hallucination: Occasionally the LLM proposes impossible object dimensions or contradictory predicates, requiring additional validation steps.
Scalability to Large Rooms: While effective for dense, localized scenes (shelves, tables), extending the approach to whole‑room layouts with navigation constraints is an open challenge.
Future Directions: The authors plan to integrate more advanced simulators (e.g., soft‑body physics), incorporate vision‑based perception loops for closed‑loop scene generation, and explore few‑shot prompting to reduce the need for extensive prompt engineering.

Authors

Yian Wang
Han Yang
Minghao Guo
Xiaowen Qiu
Tsun-Hsuan Wang
Wojciech Matusik
Joshua B. Tenenbaum
Chuang Gan

Paper Information

arXiv ID: 2602.14968v1
Categories: cs.RO, cs.AI
Published: February 16, 2026
PDF: Download PDF

[Paper] PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Knowledge-Embedded Latent Projection for Robust Representation Learning

[Paper] Policy Compiler for Secure Agentic Systems

[Paper] Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology

[Paper] Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents