[Paper] From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents

Published: 3 days ago (December 15, 2025 at 10:34 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.13438v1

Overview

The paper introduces UIFormer, a novel framework that automatically rewrites user‑interface (UI) representations to make them more compact for Large Language Model (LLM) agents. By trimming the amount of UI “token” data that an LLM has to process, UIFormer speeds up tasks such as automated UI testing, AI‑driven assistants, and cross‑platform navigation—without sacrificing accuracy.

Key Contributions

First automated UI‑representation optimizer for LLM agents, tackling both token efficiency and functional completeness.
Domain‑specific language (DSL) that encodes common UI transformation primitives (e.g., pruning invisible nodes, merging similar widgets).
Constraint‑based synthesis + LLM‑guided refinement: a two‑stage pipeline that narrows the program search space and iteratively improves solutions using correctness and efficiency rewards.
Lightweight plug‑in architecture that can be dropped into existing LLM‑based agents with negligible code changes.
Extensive evaluation on Android and Web UI navigation benchmarks (3 datasets, 5 LLM back‑ends) showing 48 %–56 % token reduction and equal or better task performance.
Real‑world validation through deployment in WeChat’s UI automation pipeline, confirming industrial relevance.

Methodology

Problem Formulation – The authors view UI optimization as a program synthesis task: given a raw UI tree, synthesize a transformation program that outputs a smaller, semantically equivalent representation.
DSL Design – The DSL contains a small set of UI‑specific operators (e.g., remove_hidden, collapse_group, abstract_text). This restricts the search space and guarantees that generated programs stay within the UI domain.
Constraint‑Based Decomposition – UIFormer first breaks the large synthesis problem into smaller sub‑problems (e.g., per screen region) and applies static constraints (type safety, hierarchy preservation) to prune invalid programs early.
LLM‑Driven Iterative Refinement – A chosen LLM (e.g., GPT‑4, Claude) proposes candidate programs. Each candidate is evaluated with two rewards:
- Correctness reward – checks that the transformed UI still satisfies a set of functional tests (e.g., can still locate target widgets).
- Efficiency reward – measures token count reduction.
  The LLM is prompted to improve the program until both rewards converge.
Plug‑in Integration – UIFormer runs as a pre‑processing step: the agent receives the optimized UI representation, executes its normal reasoning, and the plug‑in can optionally post‑process results back to the original UI if needed.

Results & Findings

Benchmark	LLM	Token Reduction	Agent Success Rate
Android UI‑Nav (3k screens)	GPT‑4	52.3 %	+1.2 %
Web UI‑Nav (2.5k pages)	Claude 2	48.7 %	unchanged
Mixed‑Platform (1.8k screens)	Llama‑2‑70B	55.8 %	+0.8 %

Runtime overhead stayed under 120 ms per UI, negligible compared with the LLM inference time.
Robustness: In >95 % of cases the transformed UI passed the same functional test suite as the original, confirming semantic preservation.
Industry deployment: At WeChat, UIFormer cut average API payload size by ~50 % and reduced end‑to‑end latency of UI‑automation bots by ~30 ms, enabling higher throughput for daily automated testing runs.

Practical Implications

Faster LLM agents – Smaller UI payloads mean less context for the LLM to embed, directly lowering token‑based cost (e.g., OpenAI API pricing) and inference latency.
Scalable UI automation – Teams can run more concurrent UI‑testing bots on the same hardware budget, especially valuable for large mobile/web app suites.
Edge deployment – On devices with limited bandwidth (e.g., IoT dashboards), transmitting a compact UI representation eases real‑time LLM assistance.
Plug‑and‑play adoption – Since UIFormer is a thin pre‑processor, existing codebases (Selenium, Appium, custom UI agents) can be upgraded without rewriting core logic.
Cross‑platform consistency – The DSL abstracts away platform‑specific quirks, allowing a single optimization pipeline for Android, iOS, and web UIs.

Limitations & Future Work

Dependence on functional test oracle – Correctness rewards rely on a set of UI‑level tests; in domains lacking comprehensive test suites, guaranteeing semantic preservation may be harder.
DSL expressiveness – While the current DSL covers common pruning and abstraction patterns, exotic UI widgets (custom canvas elements, AR overlays) may need extensions.
LLM bias – The iterative refinement step inherits any hallucination tendencies of the underlying LLM; occasional manual inspection may still be required for safety‑critical applications.
Future directions suggested by the authors include:
1. Learning a data‑driven DSL from large UI corpora.
2. Integrating reinforcement learning to replace hand‑crafted rewards.
3. Extending UIFormer to handle dynamic, event‑driven UI states (e.g., animations, lazy‑loaded content).

Authors

Dezhi Ran
Zhi Gong
Yuzhe Guo
Mengzhou Wu
Yuan Cao
Haochuan Lu
Hengyu Zhang
Xia Zeng
Gang Cao
Liangchao Yao
Yuetang Deng
Wei Yang
Tao Xie

Paper Information

arXiv ID: 2512.13438v1
Categories: cs.SE, cs.AI
Published: December 15, 2025
PDF: Download PDF

[Paper] From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Spatia: Video Generation with Updatable Spatial Memory

[Paper] Predictive Concept Decoders: Training Scalable End-to-End Interpretability Assistants

[Paper] Artism: AI-Driven Dual-Engine System for Art Generation and Critique

[Paper] Learning Model Parameter Dynamics in a Combination Therapy for Bladder Cancer from Sparse Biological Data