[Paper] Modeling Distinct Human Interaction in Web Agents

Published: (February 19, 2026 at 01:11 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.17588v1

Overview

The paper tackles a surprisingly practical problem: when should a web‑automation agent ask a human for help, and when should it just keep going on its own? By analyzing real‑world web‑navigation sessions, the authors show that users interact with agents in four distinct ways and that a language‑model‑based predictor can learn these patterns. The result is a more collaborative, “human‑in‑the‑loop” agent that feels noticeably more useful in live deployments.

Key Contributions

  • CowCorpus – a new, publicly released dataset of 400 real‑user web‑navigation sessions (≈4,200 interleaved human ↔ agent actions).
  • Interaction taxonomy – identification of four recurring user‑agent interaction styles:
    1. Hands‑off supervision (agent runs autonomously, user watches)
    2. Hands‑on oversight (user intervenes to correct or confirm)
    3. Collaborative task‑solving (user and agent share the workload)
    4. Full user takeover (agent steps back completely).
  • Intervention predictor – fine‑tuned language models that forecast the next user intervention with 61–63 % higher accuracy than baseline LMs.
  • Live user study – embedding the predictor into a web‑navigation assistant yields a 26.5 % boost in user‑rated usefulness, confirming the practical value of “intervention‑aware” behavior.

Methodology

  1. Data collection – Participants performed realistic web tasks (e.g., booking travel, shopping) while a semi‑autonomous agent suggested actions. Every click, form fill, or navigation step was logged, producing a sequence of alternating human and agent actions.
  2. Pattern discovery – The authors manually inspected the logs and clustered interaction sequences, arriving at the four‑style taxonomy.
  3. Model training – They took off‑the‑shelf language models (e.g., T5, GPT‑2) and fine‑tuned them on the CowCorpus to predict a binary “intervention” label for the next step, conditioning on the recent action history and the identified interaction style.
  4. Evaluation
    • Offline: standard classification metrics (accuracy, F1) compared against un‑adapted LMs.
    • Online: the predictor was plugged into a live web‑agent; 30+ participants completed tasks while rating the agent’s usefulness, responsiveness, and trustworthiness.

Results & Findings

MetricBaseline LMIntervention‑aware LM
Accuracy (intervention prediction)~45 %61.4 % – 63.4 %
F1 score0.480.66
User‑rated usefulness (5‑point Likert)3.24.0 (↑ 26.5 %)
Average number of unnecessary confirmations7.84.2 (↓ 46 %)

The predictor not only reduced superfluous prompts but also caught critical moments where users would otherwise have to step in manually, leading to smoother task flows and higher trust.

Practical Implications

  • Smarter assistants – Developers building browser extensions, RPA bots, or AI‑driven help desks can integrate an “intervention model” to decide when to ask for clarification versus proceeding autonomously.
  • Reduced cognitive load – By avoiding unnecessary confirmations, agents keep users focused on high‑value decisions, a win for productivity tools and enterprise workflows.
  • Personalized interaction styles – The taxonomy enables agents to adapt to a user’s preferred collaboration mode (e.g., a power user may favor hands‑off supervision, while a novice may need more hands‑on oversight).
  • Data‑driven UX design – CowCorpus provides a benchmark for testing new prompting strategies, making it easier to iterate on UI/UX for mixed‑initiative systems.

Limitations & Future Work

  • Domain scope – The study focuses on general web navigation; specialized domains (e.g., medical portals, finance dashboards) may exhibit different intervention patterns.
  • Model granularity – The predictor works at the level of “intervene vs. not,” but does not yet suggest how to intervene (e.g., which UI element to highlight).
  • Scalability of data collection – Gathering high‑quality, interleaved human‑agent logs is labor‑intensive; broader crowdsourced pipelines could expand the dataset.
  • Long‑term adaptation – Future work could explore continual learning so agents refine their intervention predictions as individual users evolve their interaction style over weeks or months.

Bottom line: By treating human interruptions as a first‑class signal rather than a nuisance, this research shows how web agents can become genuinely collaborative partners—something that developers of next‑generation automation tools should start building into their products today.

Authors

  • Faria Huq
  • Zora Zhiruo Wang
  • Zhanqiu Guo
  • Venu Arvind Arangarajan
  • Tianyue Ou
  • Frank Xu
  • Shuyan Zhou
  • Graham Neubig
  • Jeffrey P. Bigham

Paper Information

  • arXiv ID: 2602.17588v1
  • Categories: cs.CL, cs.HC
  • Published: February 19, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »