[Paper] SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Published: (June 8, 2026 at 11:51 AM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.09669v1

Overview

Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understanding. We introduce SpatialWorld, a unified benchmark designed specifically for evaluating the interactive spatial understanding of multimodal agents in complex real-world tasks. Integrating eight heterogeneous simulation backends under a shared, simulator-agnostic protocol, SpatialWorld features 760 human-annotated tasks across diverse domains (e.g., household routines, travel, social collaboration). Agents must solve tasks under vision-only partial observability, actively gathering egocentric visual evidence and expressing decisions via a unified, text-based action interface native to MLLMs. For reliable evaluation, each task includes a human-validated initial state, a reference trajectory, and a terminal-state verifier. Evaluating 15 advanced agents reveals that robust spatial task solving remains challenging: the strongest model, GPT-5, achieves an average task success rate (TSR) of only 17.4%, while the leading open-source model, Qwen-3.5, reaches 14.1%. Further analysis exposes a clear mismatch between task success and execution efficiency, alongside substantial domain-specific performance variations. These bottlenecks in active exploration and long-horizon planning position SpatialWorld as a rigorous testbed for future spatial agents.

Key Contributions

This paper presents research in the following areas:

  • cs.AI
  • cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.AI.

Authors

  • Hongcheng Gao
  • Hailong Qu
  • Jingyi Tang
  • Jiahao Wang
  • Zihao Huang
  • Hengkang Qiao
  • Shihong Huang
  • Junming Yang
  • Yi Li
  • Hongyixuan Yuan
  • Wenjie Li
  • Bohan Zeng
  • Wenbo Li
  • Bo Wang
  • Jianhui Liu
  • Olive Huang
  • Haoyang Huang
  • Wentao Zhang
  • Guoqing Huang
  • Nan Duan
  • Yinpeng Dong

Paper Information

  • arXiv ID: 2606.09669v1
  • Categories: cs.AI, cs.CL
  • Published: June 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »