[Paper] PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement
Source: arXiv - 2602.14968v1
Overview
The paper introduces PhyScensis, a novel framework that lets large‑language‑model (LLM) agents design richly detailed 3‑D scenes while guaranteeing that the resulting arrangements obey real‑world physics. By tightly coupling an LLM‑driven planner with a physics engine, the system can automatically generate complex tabletop, shelf, or packing scenarios that are both visually plausible and physically stable—an essential capability for scaling robot simulation pipelines.
Key Contributions
- Physics‑augmented LLM agent that iteratively proposes objects together with spatial and physical predicates (e.g., “book A rests on shelf S”).
- Solver‑feedback loop: a physics engine validates the predicates, resolves collisions, and returns stability metrics that guide the LLM to refine the layout.
- Probabilistic programming layer for fine‑grained control over numeric parameters (exact positions, contact forces) while preserving stochastic diversity.
- Joint stability‑spatial heuristic that balances physical feasibility with compact, high‑density arrangements, enabling scenes with dozens of interacting items.
- Comprehensive evaluation showing superior scene complexity, visual fidelity, and physical correctness compared with prior 3‑D layout generators.
Methodology
- Prompt & Goal Specification – The user supplies a high‑level textual description (e.g., “organize a bookshelf with 30 books of varying sizes”).
- LLM Agent Planning – The LLM generates a sequence of asset‑predicate statements, each describing an object and its intended relationship (support, containment, contact).
- Physics‑Enabled Solver – A lightweight physics engine (e.g., PyBullet) takes the predicates, places the objects, and runs a short simulation to check for interpenetrations and stability.
- Feedback & Refinement – The solver returns a stability score and any violation details. The LLM uses this feedback to revise predicates, add missing supports, or adjust positions.
- Probabilistic Programming Wrapper – Numerical attributes (exact coordinates, orientation) are sampled from learned distributions conditioned on the LLM’s textual output, allowing controlled randomness and reproducibility.
- Iterative Convergence – The loop repeats until the scene meets predefined thresholds for stability and spatial compactness, at which point the final 3‑D scene is exported for simulation or rendering.
Results & Findings
- Complexity: PhyScensis generated scenes with up to 70 objects (e.g., books, mugs, boxes) on a single shelf, far exceeding the 15‑20 object limit typical of prior methods.
- Physical Accuracy: In a benchmark of 500 generated layouts, 92 % remained stable after a 5‑second physics simulation, compared to 68 % for the strongest baseline.
- Visual Quality: Human evaluators rated PhyScensis layouts as 4.3/5 on realism, versus 3.1/5 for non‑physics‑aware generators.
- Speed: The iterative loop converged in an average of 3.2 iterations, taking roughly 1.8 s per scene on a single GPU, making it practical for large‑scale data generation.
Practical Implications
- Robotics Simulation: Researchers can automatically spin up thousands of physically plausible manipulation scenarios (e.g., pick‑and‑place, packing) without hand‑crafting each environment, accelerating data collection for reinforcement learning and imitation learning.
- Synthetic Dataset Creation: Vision‑and‑physics datasets (e.g., for affordance detection or stability prediction) can be generated at scale with accurate ground‑truth contact and support labels.
- Game & AR/VR Content: Designers can use natural‑language prompts to populate interiors or puzzle rooms that behave correctly under physics, reducing manual layout time.
- Human‑Robot Interaction: Service robots can be pre‑trained on a wide variety of shelf‑stocking or tabletop‑arrangement tasks, improving transfer to real‑world deployments.
Limitations & Future Work
- Physics Engine Fidelity: The current solver uses simplified rigid‑body dynamics; deformable objects or fluid interactions remain out of scope.
- LLM Hallucination: Occasionally the LLM proposes impossible object dimensions or contradictory predicates, requiring additional validation steps.
- Scalability to Large Rooms: While effective for dense, localized scenes (shelves, tables), extending the approach to whole‑room layouts with navigation constraints is an open challenge.
- Future Directions: The authors plan to integrate more advanced simulators (e.g., soft‑body physics), incorporate vision‑based perception loops for closed‑loop scene generation, and explore few‑shot prompting to reduce the need for extensive prompt engineering.
Authors
- Yian Wang
- Han Yang
- Minghao Guo
- Xiaowen Qiu
- Tsun-Hsuan Wang
- Wojciech Matusik
- Joshua B. Tenenbaum
- Chuang Gan
Paper Information
- arXiv ID: 2602.14968v1
- Categories: cs.RO, cs.AI
- Published: February 16, 2026
- PDF: Download PDF