[Paper] PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Published: 3 weeks ago (January 14, 2026 at 12:12 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.09636v1

Overview

The paper introduces PersonalAlign, a new paradigm for GUI agents that must understand implicit user intents by tapping into long‑term, user‑specific interaction histories. By building a hierarchical memory of preferences and routines, the proposed system can fill in missing details in vague commands and even anticipate actions before the user asks, moving GUI assistants closer to truly personalized, proactive helpers.

Key Contributions

PersonalAlign task definition – formalizes the challenge of aligning GUI agents with implicit user intents using persistent, long‑term records.
AndroidIntent benchmark – a large‑scale dataset (20 k interaction logs, 775 annotated preferences, 215 routines) for evaluating vague‑instruction resolution and proactive assistance.
Hierarchical Intent Memory Agent (HIM‑Agent) – a novel architecture that continuously updates a personal memory and organizes preferences/routines hierarchically for efficient retrieval.
Comprehensive evaluation – compares state‑of‑the‑art models (GPT‑5, Qwen‑3‑VL, UI‑TARS) on AndroidIntent, showing HIM‑Agent lifts execution accuracy by 15.7 % and proactive suggestion quality by 7.3 %.

Methodology

Data collection & annotation – The authors mined 20 k Android UI interaction logs from multiple users. Human annotators labeled recurring user‑specific preferences (e.g., “always open links in Chrome”) and routine sequences (e.g., “morning news → email → calendar”).
Task formulation – Each test episode supplies a vague instruction (e.g., “check my messages”) plus the user’s long‑term record. The agent must (a) infer the missing intent, (b) execute the correct UI actions, and (c) optionally suggest proactive next steps.
HIM‑Agent architecture
- Personal Memory Buffer: a continuously refreshed store of the user’s past UI events.
- Hierarchical Intent Graph: top‑level nodes capture high‑level preferences (e.g., “default browser”), while lower‑level nodes encode routine chains.
- Retrieval & Reasoning Module: given a new instruction, the agent queries the graph, ranks candidate intents with a lightweight transformer, and generates a UI action plan.
Evaluation protocol – Metrics include Execution Success Rate (did the agent finish the task correctly?) and Proactive Suggestion Score (how useful were the anticipatory actions?). Baselines are run with the same prompts but without the hierarchical memory.

Results & Findings

Model	Execution Success ↑	Proactive Suggestion ↑
GPT‑5 (no memory)	68.2 %	42.1 %
Qwen‑3‑VL (no memory)	70.5 %	44.3 %
UI‑TARS (no memory)	65.9 %	40.7 %
HIM‑Agent (with hierarchical memory)	84.9 % (+15.7 %)	51.4 % (+7.3 %)

Key takeaways

Access to a structured personal memory dramatically reduces failure cases caused by ambiguous commands.
Hierarchical organization (preference vs. routine) yields more accurate proactive suggestions than flat memory look‑ups.
Even large LLMs benefit from an external, domain‑specific memory rather than relying solely on their internal knowledge.

Practical Implications

Developer toolkits: The hierarchical intent memory can be packaged as a lightweight SDK for Android/iOS apps, enabling third‑party assistants to personalize without retraining massive models.
Enterprise automation: Business workflows (e.g., ticket triage, CRM updates) often involve repetitive, user‑specific steps; integrating HIM‑Agent‑style memory can cut down on clarification dialogs and speed up task completion.
Privacy‑preserving personalization: Because the memory resides on‑device and only the retrieval scores are sent to the LLM, user preferences stay local, aligning with emerging privacy regulations.
Proactive UX: Mobile OS vendors could embed this approach to surface context‑aware shortcuts (“You usually check the weather after opening the calendar at 8 am”) without hard‑coding rules.

Limitations & Future Work

Scalability of the hierarchical graph – As the number of recorded interactions grows, maintaining low‑latency retrieval may require more sophisticated indexing or pruning strategies.
Cross‑device continuity – The current setup assumes a single device’s logs; extending memory across phones, tablets, and desktops remains an open challenge.
Generalization to new users – Cold‑start scenarios where little history exists were not deeply explored; hybrid approaches combining demographic priors with early interaction signals could help.
Evaluation breadth – AndroidIntent focuses on Android UI; applying the framework to web browsers, desktop GUIs, or voice‑first assistants would test its universality.

PersonalAlign demonstrates that a well‑structured, continuously updated personal memory can turn generic GUI agents into truly personalized assistants, opening a path toward more intuitive, proactive human‑computer interaction.

Authors

Yibo Lyu
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie

Paper Information

arXiv ID: 2601.09636v1
Categories: cs.AI, cs.CV, cs.HC, cs.LG
Published: January 14, 2026
PDF: Download PDF

[Paper] PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models

[Paper] PRISM-CAFO: Prior-conditioned Remote-sensing Infrastructure Segmentation and Mapping for CAFOs

[Paper] When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models