[Paper] BAMI: Training-Free Bias Mitigation in GUI Grounding

Published: 3 days ago (May 7, 2026 at 01:59 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.06664v1

Overview

The paper introduces BAMI, a training‑free technique that dramatically reduces two hidden sources of error—precision bias from high‑resolution screenshots and ambiguity bias from crowded UI elements—when grounding graphical user interfaces (GUIs). By plugging BAMI into existing GUI‑grounding models, developers can boost performance on challenging benchmarks like ScreenSpot‑Pro without retraining any models.

Key Contributions

Bias diagnosis with Masked Prediction Distribution (MPD): a novel attribution tool that pinpoints precision and ambiguity biases in GUI grounding pipelines.
Bias‑Aware Manipulation Inference (BAMI): a lightweight, inference‑only framework that applies two manipulations—coarse‑to‑fine focus and candidate selection—to counteract the identified biases.
Training‑free performance gains: demonstrated across multiple state‑of‑the‑art models (e.g., TianXi‑Action‑7B) with up to +6 % absolute accuracy on the ScreenSpot‑Pro benchmark.
Robustness validated by extensive ablations: showing stable improvements across a wide range of hyper‑parameter settings.
Open‑source implementation: the authors release code and scripts, making it easy for practitioners to adopt BAMI in their pipelines.

Methodology

Detecting bias with MPD – The authors mask random patches of a GUI screenshot and observe how the model’s prediction distribution changes. Large shifts reveal where the model is overly sensitive (precision bias) or confused (ambiguity bias).
Coarse‑to‑fine focus – Instead of feeding the full‑resolution image directly, BAMI first runs the model on a down‑sampled (coarse) version to locate the general region of interest, then refines the prediction on a high‑resolution crop of that region. This reduces the precision bias caused by unnecessary pixel‑level detail.
Candidate selection – For UI elements that look similar (e.g., multiple buttons with the same icon), BAMI generates a short list of plausible candidates from the coarse pass and re‑ranks them using a lightweight similarity score that incorporates textual cues (labels, tooltips). This mitigates ambiguity bias without any extra training.
Inference‑only pipeline – All steps are performed at test time; no gradients are computed, and no model weights are altered. The approach can be wrapped around any existing GUI‑grounding model as a drop‑in post‑processor.

Results & Findings

Model (baseline)	Accuracy on ScreenSpot‑Pro	Accuracy with BAMI	Δ
TianXi‑Action‑7B	51.9 %	57.8 %	+5.9 %
Other SOTA models	48–53 %	53–58 %	+4–6 %

Consistent gains across all tested models, confirming that the biases are model‑agnostic.
Ablation studies show that removing either the coarse‑to‑fine focus or the candidate selection drops performance back to near‑baseline, proving that both components are essential.
Parameter stability: varying the down‑sampling factor (2×–8×) or the candidate list size (3–7) only changes results by ≤0.5 %, indicating that BAMI works out‑of‑the‑box with minimal tuning.

Practical Implications

Faster deployment: Teams can improve existing GUI‑automation agents (e.g., test‑automation bots, accessibility tools) without costly retraining cycles.
Higher reliability in production: Reducing precision bias means fewer missed clicks on high‑DPI screens; mitigating ambiguity bias cuts down on wrong‑element selections in dense dashboards.
Plug‑and‑play for heterogeneous UIs: Because BAMI operates purely at inference, it can be added to pipelines that already support multiple device form‑factors (mobile, desktop, web).
Cost‑effective scaling: Organizations can roll out upgraded agents across thousands of machines by simply updating the inference wrapper, avoiding GPU‑intensive fine‑tuning.
Open‑source integration: The provided GitHub repo includes ready‑made wrappers for popular frameworks (PyTorch, TensorFlow), making it straightforward to embed BAMI into CI/CD testing suites or RPA platforms.

Limitations & Future Work

Dependence on visual quality: Extremely low‑resolution screenshots may still hinder the coarse‑to‑fine step, as the initial region‑proposal becomes noisy.
Limited to static GUIs: The current design assumes a single static frame; extending BAMI to video‑based interactions (e.g., drag‑and‑drop animations) remains an open challenge.
Candidate selection heuristics: While effective, the similarity scoring relies on textual metadata; GUIs lacking accessible labels may see reduced gains.
Future directions include integrating lightweight OCR to enrich textual cues, exploring adaptive down‑sampling strategies based on UI complexity, and applying BAMI to multimodal agents that combine speech commands with visual grounding.

Authors

Borui Zhang
Bo Zhang
Bo Wang
Wenzhao Zheng
Yuhao Cheng
Liang Tang
Yiqiang Yan
Jie Zhou
Jiwen Lu

Paper Information

arXiv ID: 2605.06664v1
Categories: cs.CV, cs.AI
Published: May 7, 2026
PDF: Download PDF

[Paper] BAMI: Training-Free Bias Mitigation in GUI Grounding

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

[Paper] Flow-OPD: On-Policy Distillation for Flow Matching Models

[Paper] SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation