[Paper] OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Published: 3 days ago (June 8, 2026 at 01:59 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.09826v1

Overview

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We address these gaps with OmniGameArena, a real-time benchmark of twelve newly built Unreal Engine 5 games spanning Solo (7), PvP (3), and Coop (2) with unified action interfaces, and the Improvement Dynamics Curve (IDC), an agentic-reflection harness in which a tool-using reflector LLM autonomously refines a bounded skill prompt across multiple rounds. Beyond cold-start leaderboard scores, IDC exposes two additional observables for each (agent, game) pair: how the score evolves across reflection rounds, and how the learned skill behaves on held-out task variants. We report these observables for twelve VLM agents on the cold-start leaderboard and four top agents under IDC.

Key Contributions

This paper presents research in the following areas:

cs.CV
cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CV.

Authors

Mingxian Lin
Shengju Qian
Yuqi Liu
Yi-Hua Huang
Yiyu Wang
Wei Huang
Yitang Li
Fan Zhang
Zeyu Hu
Lingting Zhu
Xin Wang
Xiaojuan Qi

Paper Information

arXiv ID: 2606.09826v1
Categories: cs.CV, cs.AI
Published: June 8, 2026
PDF: Download PDF

[Paper] OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

[Paper] DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

[Paper] Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

[Paper] Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

[Paper] DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

[Paper] Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

[Paper] Atlas H&amp;E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy

[Paper] Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy