[Paper] PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Published: (January 14, 2026 at 12:12 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.09636v1

Overview

The paper introduces PersonalAlign, a new paradigm for GUI agents that must understand implicit user intents by tapping into long‑term, user‑specific interaction histories. By building a hierarchical memory of preferences and routines, the proposed system can fill in missing details in vague commands and even anticipate actions before the user asks, moving GUI assistants closer to truly personalized, proactive helpers.

Key Contributions

  • PersonalAlign task definition – formalizes the challenge of aligning GUI agents with implicit user intents using persistent, long‑term records.
  • AndroidIntent benchmark – a large‑scale dataset (20 k interaction logs, 775 annotated preferences, 215 routines) for evaluating vague‑instruction resolution and proactive assistance.
  • Hierarchical Intent Memory Agent (HIM‑Agent) – a novel architecture that continuously updates a personal memory and organizes preferences/routines hierarchically for efficient retrieval.
  • Comprehensive evaluation – compares state‑of‑the‑art models (GPT‑5, Qwen‑3‑VL, UI‑TARS) on AndroidIntent, showing HIM‑Agent lifts execution accuracy by 15.7 % and proactive suggestion quality by 7.3 %.

Methodology

  1. Data collection & annotation – The authors mined 20 k Android UI interaction logs from multiple users. Human annotators labeled recurring user‑specific preferences (e.g., “always open links in Chrome”) and routine sequences (e.g., “morning news → email → calendar”).
  2. Task formulation – Each test episode supplies a vague instruction (e.g., “check my messages”) plus the user’s long‑term record. The agent must (a) infer the missing intent, (b) execute the correct UI actions, and (c) optionally suggest proactive next steps.
  3. HIM‑Agent architecture
    • Personal Memory Buffer: a continuously refreshed store of the user’s past UI events.
    • Hierarchical Intent Graph: top‑level nodes capture high‑level preferences (e.g., “default browser”), while lower‑level nodes encode routine chains.
    • Retrieval & Reasoning Module: given a new instruction, the agent queries the graph, ranks candidate intents with a lightweight transformer, and generates a UI action plan.
  4. Evaluation protocol – Metrics include Execution Success Rate (did the agent finish the task correctly?) and Proactive Suggestion Score (how useful were the anticipatory actions?). Baselines are run with the same prompts but without the hierarchical memory.

Results & Findings

ModelExecution Success ↑Proactive Suggestion ↑
GPT‑5 (no memory)68.2 %42.1 %
Qwen‑3‑VL (no memory)70.5 %44.3 %
UI‑TARS (no memory)65.9 %40.7 %
HIM‑Agent (with hierarchical memory)84.9 % (+15.7 %)51.4 % (+7.3 %)

Key takeaways

  • Access to a structured personal memory dramatically reduces failure cases caused by ambiguous commands.
  • Hierarchical organization (preference vs. routine) yields more accurate proactive suggestions than flat memory look‑ups.
  • Even large LLMs benefit from an external, domain‑specific memory rather than relying solely on their internal knowledge.

Practical Implications

  • Developer toolkits: The hierarchical intent memory can be packaged as a lightweight SDK for Android/iOS apps, enabling third‑party assistants to personalize without retraining massive models.
  • Enterprise automation: Business workflows (e.g., ticket triage, CRM updates) often involve repetitive, user‑specific steps; integrating HIM‑Agent‑style memory can cut down on clarification dialogs and speed up task completion.
  • Privacy‑preserving personalization: Because the memory resides on‑device and only the retrieval scores are sent to the LLM, user preferences stay local, aligning with emerging privacy regulations.
  • Proactive UX: Mobile OS vendors could embed this approach to surface context‑aware shortcuts (“You usually check the weather after opening the calendar at 8 am”) without hard‑coding rules.

Limitations & Future Work

  • Scalability of the hierarchical graph – As the number of recorded interactions grows, maintaining low‑latency retrieval may require more sophisticated indexing or pruning strategies.
  • Cross‑device continuity – The current setup assumes a single device’s logs; extending memory across phones, tablets, and desktops remains an open challenge.
  • Generalization to new users – Cold‑start scenarios where little history exists were not deeply explored; hybrid approaches combining demographic priors with early interaction signals could help.
  • Evaluation breadth – AndroidIntent focuses on Android UI; applying the framework to web browsers, desktop GUIs, or voice‑first assistants would test its universality.

PersonalAlign demonstrates that a well‑structured, continuously updated personal memory can turn generic GUI agents into truly personalized assistants, opening a path toward more intuitive, proactive human‑computer interaction.

Authors

  • Yibo Lyu
  • Gongwei Chen
  • Rui Shao
  • Weili Guan
  • Liqiang Nie

Paper Information

  • arXiv ID: 2601.09636v1
  • Categories: cs.AI, cs.CV, cs.HC, cs.LG
  • Published: January 14, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »