[Paper] Localising Shortcut Learning in Pixel Space via Ordinal Scoring Correlations for Attribution Representations (OSCAR)

Published: 1 week ago (December 21, 2025 at 04:06 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.18888v1

Overview

The paper presents OSCAR, a model‑agnostic toolkit that turns pixel‑level attribution maps into statistical “rank profiles” and then uses correlation analysis to pinpoint where a deep network is leaning on spurious shortcuts. By comparing a test model against a balanced baseline and a sensitive‑attribute predictor, OSCAR quantifies shortcut reliance and highlights the exact image regions responsible—something that was previously limited to vague visual inspection.

Key Contributions

Dataset‑level shortcut scoring: Converts per‑image attribution maps into rank‑ordered region profiles, enabling statistical comparison across models.
Three‑way correlation framework: Pairwise, partial, and deviation‑based correlations between the test model (TS), a balanced baseline (BA), and a sensitive‑attribute predictor (SA) expose shortcut dependence.
Model‑agnostic and lightweight: Works with any pretrained network and only requires pixel‑space attribution maps (e.g., Grad‑CAM, Integrated Gradients).
Robustness validation: Demonstrates stability across random seeds, data splits, and varying shortcut strength on CelebA (faces), CheXpert (chest X‑rays), and ADNI (MRI).
Practical mitigation recipe: Shows that attenuating the identified shortcut regions at test time reduces worst‑group performance gaps.
Open‑source implementation: Full code released, encouraging reproducibility and rapid adoption.

Methodology

Generate attribution maps – For each image, any standard attribution technique (e.g., Grad‑CAM) produces a heatmap that scores how much each pixel contributed to the model’s prediction.
Create rank profiles – Pixels are sorted from most to least important, yielding a rank vector that captures the spatial importance ordering for that image.
Build three model sets
- BA (Balanced baseline): Trained on a version of the data where the shortcut feature is decorrelated from the label.
- TS (Test model): The model under scrutiny, potentially exploiting shortcuts.
- SA (Sensitive‑attribute predictor): A model trained to predict the known sensitive attribute (e.g., gender, disease severity).
Correlation analysis – For each image region (e.g., super‑pixel or patch), compute:
- Pairwise correlation between TS and SA rank scores (how similarly they weight the region).
- Partial correlation controlling for BA (isolates shortcut effect beyond what a fair model would do).
- Deviation‑based correlation measuring how far TS’s ranking deviates from BA’s while aligning with SA.
Aggregate metrics – Summarize per‑region correlations across the dataset to produce a heatmap of “shortcut intensity” and a scalar shortcut‑reliance score.
Mitigation (optional) – At inference, down‑weight or mask the top‑scoring shortcut regions before feeding the image to TS, reducing bias.

Results & Findings

Dataset	Shortcut type	Correlation metric behavior	Key takeaway
CelebA (hair‑color vs. gender)	Visible, localized	High pairwise TS‑SA correlation; low partial correlation when BA is included	OSCAR correctly flags hair region as shortcut.
CheXpert (presence of a tube vs. disease label)	Diffuse, subtle	Deviation‑based correlation rises with stronger shortcut injection; stable across seeds	Shows OSCAR can detect non‑obvious, spread‑out cues.
ADNI (scanner site vs. Alzheimer’s diagnosis)	Non‑visual, domain shift	Correlation metrics remain significant even when attribution maps look uniform to the eye	Demonstrates utility in medical imaging where shortcuts are invisible.

Additional observations

Stability: Correlation scores vary < 2 % across 10 random seeds and 5‑fold splits.
Sensitivity: When the training data’s shortcut‑label association is reduced from 0.9 to 0.5 (Pearson), the shortcut score drops proportionally, confirming the metric tracks true shortcut strength.
Mitigation impact: Simple test‑time attenuation of the top‑10 % shortcut regions improves worst‑group accuracy by 4–7 % on CelebA and CheXpert without hurting overall performance.

Practical Implications

Bias audits for regulated domains: Developers building AI for healthcare, finance, or hiring can run OSCAR on existing models to surface hidden bias cues that are not visually apparent.
Model selection & debugging: Instead of relying on a single validation set, teams can compare multiple candidate architectures by their shortcut scores, preferring models that score low even if overall accuracy is similar.
Lightweight deployment: Because OSCAR works post‑hoc on attribution maps, it can be integrated into CI pipelines without retraining models.
Targeted data collection: The spatial maps reveal where the data is leaking shortcuts, guiding curators to collect more balanced samples or augmentations for those regions.
Test‑time safety nets: The attenuation step can be packaged as a “bias‑filter” layer that automatically suppresses suspicious regions before the final prediction, offering a quick mitigation while a more thorough retraining is planned.

Limitations & Future Work

Dependence on attribution quality: Noisy or biased attribution methods can propagate errors into the rank profiles; the authors recommend robust explainability tools but acknowledge this as a bottleneck.
Patch granularity trade‑off: Very fine‑grained patches increase computational cost and may over‑fit to noise, while coarse patches can miss subtle shortcuts. Adaptive patch sizing is left for future research.
Only pixel‑space cues: OSCAR currently cannot capture shortcuts that live in feature‑space (e.g., frequency patterns) without a visual proxy. Extending the framework to other modalities (audio, text embeddings) is an open direction.
Mitigation simplicity: The test‑time attenuation is a proof‑of‑concept; more sophisticated debiasing (e.g., adversarial training guided by OSCAR scores) could yield larger gains.

Overall, OSCAR offers a practical, statistically grounded lens for developers to detect, quantify, and begin to mitigate shortcut learning—an increasingly critical step toward trustworthy AI systems.

Authors

Akshit Achara
Peter Triantafillou
Esther Puyol‑Antón
Alexander Hammers
Andrew P. King

Paper Information

arXiv ID: 2512.18888v1
Categories: cs.CV
Published: December 21, 2025
PDF: Download PDF

[Paper] Localising Shortcut Learning in Pixel Space via Ordinal Scoring Correlations for Attribution Representations (OSCAR)

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

[Paper] ProEdit: Inversion-based Editing From Prompts Done Right

[Paper] Learning Association via Track-Detection Matching for Multi-Object Tracking

[Paper] Yume-1.5: A Text-Controlled Interactive World Generation Model