[Paper] GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Published: (February 25, 2026 at 01:34 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.22190v1

Overview

GUI‑Libra tackles a persistent gap between open‑source and proprietary GUI‑automation agents, especially on long‑horizon tasks such as multi‑step web or mobile workflows. By redesigning the data pipeline and the fine‑tuning / reinforcement‑learning stages, the authors show that native agents can achieve dramatically higher success rates without the need for massive online interaction data.

Key Contributions

  • Curated reasoning dataset: 81 K high‑quality “reason‑then‑act” examples for web and mobile GUIs, built with a systematic construction‑and‑filtering pipeline.
  • Action‑aware supervised fine‑tuning (SFT): A mixed‑data strategy that blends pure reasoning traces with direct‑action examples, plus token‑level re‑weighting that forces the model to focus on grounding actions.
  • Stabilized RL under partial verifiability: Introduction of a KL‑regularized trust region for the RL‑with‑verification‑reward (RLVR) loop, plus a success‑adaptive gradient scaling that down‑weights noisy negative updates when the environment is ambiguous.
  • Empirical validation: Consistent gains on several public web‑automation (e.g., MiniWoB) and mobile‑automation benchmarks, improving both step‑wise accuracy and end‑to‑end task completion.
  • Open resources: Release of the 81 K dataset, training code, and pretrained models to the community.

Methodology

  1. Data Construction & Filtering

    • Harvested raw interaction logs from existing GUI agents and human demonstrations.
    • Applied heuristic filters (action‑token consistency, language fluency, duplicate removal) to keep only traces where the natural‑language reasoning aligns tightly with the subsequent UI action.
    • Result: a clean, diverse corpus covering a wide range of UI elements (buttons, dropdowns, gestures, etc.).
  2. Action‑Aware Supervised Fine‑Tuning

    • Instead of pure chain‑of‑thought (CoT) prompts, the training mix includes:
      • Reason‑then‑action examples (text reasoning followed by the exact UI command).
      • Direct‑action examples (no reasoning, just the correct UI command).
    • Token‑level loss re‑weighting amplifies gradients on action tokens and UI identifiers, encouraging the model to stay grounded while still reasoning.
  3. Reinforcement Learning with Partial Verifiability (RLVR)

    • Traditional step‑wise RL treats a single demonstrated action as the only “correct” one, even though many actions could be valid. This creates a partial verifiability problem that hurts offline metrics.
    • GUI‑Libra adds a KL‑regularization term that penalizes the policy for drifting too far from the SFT baseline, effectively forming a trust region.
    • A success‑adaptive scaling factor monitors online episode outcomes; when the agent succeeds, negative gradients from mismatched actions are attenuated, preventing over‑penalization of alternative valid moves.
  4. Training Pipeline

    • Stage 1: Action‑aware SFT on the curated 81 K dataset.
    • Stage 2: KL‑regularized RLVR on a small set of offline trajectories, followed by a brief online fine‑tune (optional) to polish performance.

Results & Findings

BenchmarkBaseline (SFT‑only)GUI‑Libra (SFT + RLVR)↑ End‑to‑End Success
MiniWoB (web)48 %66 %+18 pp
Mobile‑Env (Android)42 %61 %+19 pp
Step‑wise Accuracy (average)71 %84 %+13 pp
  • Offline metrics become predictive: The KL‑regularized RLVR correlates strongly (ρ ≈ 0.78) with online success, fixing the “partial verifiability” disconnect observed in prior work.
  • Ablation studies show that removing either the action‑aware token re‑weighting or the KL trust region drops performance by ~7‑9 pp, confirming each component’s necessity.
  • Data efficiency: With only ~10 K additional fine‑tuning steps, the model matches or exceeds closed‑source baselines that required millions of online interactions.

Practical Implications

  • Faster prototyping of UI bots: Developers can now fine‑tune a pre‑trained language model on the released 81 K dataset and obtain a competent GUI agent in a few hours, rather than weeks of costly data collection.
  • More reliable automation scripts: Action‑aware SFT reduces “hallucinated clicks” where the model reasons correctly but issues an out‑of‑scope UI command, a common pain point in current open‑source agents.
  • Safer RL deployment: The KL trust region acts as a built‑in safeguard, preventing the policy from taking wildly exploratory (and potentially destructive) actions during online learning—critical for production environments that cannot afford UI crashes.
  • Cross‑platform applicability: Because the dataset spans both web and mobile interactions, the same fine‑tuning pipeline can be reused for desktop, web, or mobile automation tools, lowering the barrier for multi‑platform bots.

Limitations & Future Work

  • Partial verifiability still relies on a single demonstrated action; while KL regularization mitigates the issue, truly multi‑modal verification (e.g., using UI state equivalence classes) remains unexplored.
  • Dataset bias: The curated 81 K examples are drawn from a limited set of popular apps and websites; performance may degrade on niche or highly dynamic UIs.
  • Scalability of RLVR: The current RL loop is offline‑heavy; extending it to large‑scale, on‑device learning (e.g., edge mobile agents) will require more efficient credit‑assignment methods.
  • User intent handling: The work assumes well‑specified natural‑language goals; integrating ambiguous or multi‑intent queries is an open research direction.

GUI‑Libra demonstrates that thoughtful data curation and training recipes can bridge the performance gap for open‑source GUI agents, offering a practical roadmap for developers eager to build reliable, reasoning‑capable automation tools.

Authors

  • Rui Yang
  • Qianhui Wu
  • Zhaoyang Wang
  • Hanyang Chen
  • Ke Yang
  • Hao Cheng
  • Huaxiu Yao
  • Baoling Peng
  • Huan Zhang
  • Jianfeng Gao
  • Tong Zhang

Paper Information

  • arXiv ID: 2602.22190v1
  • Categories: cs.LG, cs.AI, cs.CL
  • Published: February 25, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »