[Paper] Modeling Distinct Human Interaction in Web Agents

Published: 2 months ago (February 19, 2026 at 01:11 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.17588v1

Overview

The paper tackles a surprisingly practical problem: when should a web‑automation agent ask a human for help, and when should it just keep going on its own? By analyzing real‑world web‑navigation sessions, the authors show that users interact with agents in four distinct ways and that a language‑model‑based predictor can learn these patterns. The result is a more collaborative, “human‑in‑the‑loop” agent that feels noticeably more useful in live deployments.

Key Contributions

CowCorpus – a new, publicly released dataset of 400 real‑user web‑navigation sessions (≈4,200 interleaved human ↔ agent actions).
Interaction taxonomy – identification of four recurring user‑agent interaction styles:
1. Hands‑off supervision (agent runs autonomously, user watches)
2. Hands‑on oversight (user intervenes to correct or confirm)
3. Collaborative task‑solving (user and agent share the workload)
4. Full user takeover (agent steps back completely).
Intervention predictor – fine‑tuned language models that forecast the next user intervention with 61–63 % higher accuracy than baseline LMs.
Live user study – embedding the predictor into a web‑navigation assistant yields a 26.5 % boost in user‑rated usefulness, confirming the practical value of “intervention‑aware” behavior.

Methodology

Data collection – Participants performed realistic web tasks (e.g., booking travel, shopping) while a semi‑autonomous agent suggested actions. Every click, form fill, or navigation step was logged, producing a sequence of alternating human and agent actions.
Pattern discovery – The authors manually inspected the logs and clustered interaction sequences, arriving at the four‑style taxonomy.
Model training – They took off‑the‑shelf language models (e.g., T5, GPT‑2) and fine‑tuned them on the CowCorpus to predict a binary “intervention” label for the next step, conditioning on the recent action history and the identified interaction style.
Evaluation –
- Offline: standard classification metrics (accuracy, F1) compared against un‑adapted LMs.
- Online: the predictor was plugged into a live web‑agent; 30+ participants completed tasks while rating the agent’s usefulness, responsiveness, and trustworthiness.

Results & Findings

Metric	Baseline LM	Intervention‑aware LM
Accuracy (intervention prediction)	~45 %	61.4 % – 63.4 %
F1 score	0.48	0.66
User‑rated usefulness (5‑point Likert)	3.2	4.0 (↑ 26.5 %)
Average number of unnecessary confirmations	7.8	4.2 (↓ 46 %)

The predictor not only reduced superfluous prompts but also caught critical moments where users would otherwise have to step in manually, leading to smoother task flows and higher trust.

Practical Implications

Smarter assistants – Developers building browser extensions, RPA bots, or AI‑driven help desks can integrate an “intervention model” to decide when to ask for clarification versus proceeding autonomously.
Reduced cognitive load – By avoiding unnecessary confirmations, agents keep users focused on high‑value decisions, a win for productivity tools and enterprise workflows.
Personalized interaction styles – The taxonomy enables agents to adapt to a user’s preferred collaboration mode (e.g., a power user may favor hands‑off supervision, while a novice may need more hands‑on oversight).
Data‑driven UX design – CowCorpus provides a benchmark for testing new prompting strategies, making it easier to iterate on UI/UX for mixed‑initiative systems.

Limitations & Future Work

Domain scope – The study focuses on general web navigation; specialized domains (e.g., medical portals, finance dashboards) may exhibit different intervention patterns.
Model granularity – The predictor works at the level of “intervene vs. not,” but does not yet suggest how to intervene (e.g., which UI element to highlight).
Scalability of data collection – Gathering high‑quality, interleaved human‑agent logs is labor‑intensive; broader crowdsourced pipelines could expand the dataset.
Long‑term adaptation – Future work could explore continual learning so agents refine their intervention predictions as individual users evolve their interaction style over weeks or months.

Bottom line: By treating human interruptions as a first‑class signal rather than a nuisance, this research shows how web agents can become genuinely collaborative partners—something that developers of next‑generation automation tools should start building into their products today.

Authors

Faria Huq
Zora Zhiruo Wang
Zhanqiu Guo
Venu Arvind Arangarajan
Tianyue Ou
Frank Xu
Shuyan Zhou
Graham Neubig
Jeffrey P. Bigham

Paper Information

arXiv ID: 2602.17588v1
Categories: cs.CL, cs.HC
Published: February 19, 2026
PDF: Download PDF

[Paper] Modeling Distinct Human Interaction in Web Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

[Paper] RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

[Paper] SPQ: An Ensemble Technique for Large Language Model Compression

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures