[Paper] Localising Shortcut Learning in Pixel Space via Ordinal Scoring Correlations for Attribution Representations (OSCAR)
Source: arXiv - 2512.18888v1
Overview
The paper presents OSCAR, a model‑agnostic toolkit that turns pixel‑level attribution maps into statistical “rank profiles” and then uses correlation analysis to pinpoint where a deep network is leaning on spurious shortcuts. By comparing a test model against a balanced baseline and a sensitive‑attribute predictor, OSCAR quantifies shortcut reliance and highlights the exact image regions responsible—something that was previously limited to vague visual inspection.
Key Contributions
- Dataset‑level shortcut scoring: Converts per‑image attribution maps into rank‑ordered region profiles, enabling statistical comparison across models.
- Three‑way correlation framework: Pairwise, partial, and deviation‑based correlations between the test model (TS), a balanced baseline (BA), and a sensitive‑attribute predictor (SA) expose shortcut dependence.
- Model‑agnostic and lightweight: Works with any pretrained network and only requires pixel‑space attribution maps (e.g., Grad‑CAM, Integrated Gradients).
- Robustness validation: Demonstrates stability across random seeds, data splits, and varying shortcut strength on CelebA (faces), CheXpert (chest X‑rays), and ADNI (MRI).
- Practical mitigation recipe: Shows that attenuating the identified shortcut regions at test time reduces worst‑group performance gaps.
- Open‑source implementation: Full code released, encouraging reproducibility and rapid adoption.
Methodology
- Generate attribution maps – For each image, any standard attribution technique (e.g., Grad‑CAM) produces a heatmap that scores how much each pixel contributed to the model’s prediction.
- Create rank profiles – Pixels are sorted from most to least important, yielding a rank vector that captures the spatial importance ordering for that image.
- Build three model sets
- BA (Balanced baseline): Trained on a version of the data where the shortcut feature is decorrelated from the label.
- TS (Test model): The model under scrutiny, potentially exploiting shortcuts.
- SA (Sensitive‑attribute predictor): A model trained to predict the known sensitive attribute (e.g., gender, disease severity).
- Correlation analysis – For each image region (e.g., super‑pixel or patch), compute:
- Pairwise correlation between TS and SA rank scores (how similarly they weight the region).
- Partial correlation controlling for BA (isolates shortcut effect beyond what a fair model would do).
- Deviation‑based correlation measuring how far TS’s ranking deviates from BA’s while aligning with SA.
- Aggregate metrics – Summarize per‑region correlations across the dataset to produce a heatmap of “shortcut intensity” and a scalar shortcut‑reliance score.
- Mitigation (optional) – At inference, down‑weight or mask the top‑scoring shortcut regions before feeding the image to TS, reducing bias.
Results & Findings
| Dataset | Shortcut type | Correlation metric behavior | Key takeaway |
|---|---|---|---|
| CelebA (hair‑color vs. gender) | Visible, localized | High pairwise TS‑SA correlation; low partial correlation when BA is included | OSCAR correctly flags hair region as shortcut. |
| CheXpert (presence of a tube vs. disease label) | Diffuse, subtle | Deviation‑based correlation rises with stronger shortcut injection; stable across seeds | Shows OSCAR can detect non‑obvious, spread‑out cues. |
| ADNI (scanner site vs. Alzheimer’s diagnosis) | Non‑visual, domain shift | Correlation metrics remain significant even when attribution maps look uniform to the eye | Demonstrates utility in medical imaging where shortcuts are invisible. |
Additional observations
- Stability: Correlation scores vary < 2 % across 10 random seeds and 5‑fold splits.
- Sensitivity: When the training data’s shortcut‑label association is reduced from 0.9 to 0.5 (Pearson), the shortcut score drops proportionally, confirming the metric tracks true shortcut strength.
- Mitigation impact: Simple test‑time attenuation of the top‑10 % shortcut regions improves worst‑group accuracy by 4–7 % on CelebA and CheXpert without hurting overall performance.
Practical Implications
- Bias audits for regulated domains: Developers building AI for healthcare, finance, or hiring can run OSCAR on existing models to surface hidden bias cues that are not visually apparent.
- Model selection & debugging: Instead of relying on a single validation set, teams can compare multiple candidate architectures by their shortcut scores, preferring models that score low even if overall accuracy is similar.
- Lightweight deployment: Because OSCAR works post‑hoc on attribution maps, it can be integrated into CI pipelines without retraining models.
- Targeted data collection: The spatial maps reveal where the data is leaking shortcuts, guiding curators to collect more balanced samples or augmentations for those regions.
- Test‑time safety nets: The attenuation step can be packaged as a “bias‑filter” layer that automatically suppresses suspicious regions before the final prediction, offering a quick mitigation while a more thorough retraining is planned.
Limitations & Future Work
- Dependence on attribution quality: Noisy or biased attribution methods can propagate errors into the rank profiles; the authors recommend robust explainability tools but acknowledge this as a bottleneck.
- Patch granularity trade‑off: Very fine‑grained patches increase computational cost and may over‑fit to noise, while coarse patches can miss subtle shortcuts. Adaptive patch sizing is left for future research.
- Only pixel‑space cues: OSCAR currently cannot capture shortcuts that live in feature‑space (e.g., frequency patterns) without a visual proxy. Extending the framework to other modalities (audio, text embeddings) is an open direction.
- Mitigation simplicity: The test‑time attenuation is a proof‑of‑concept; more sophisticated debiasing (e.g., adversarial training guided by OSCAR scores) could yield larger gains.
Overall, OSCAR offers a practical, statistically grounded lens for developers to detect, quantify, and begin to mitigate shortcut learning—an increasingly critical step toward trustworthy AI systems.
Authors
- Akshit Achara
- Peter Triantafillou
- Esther Puyol‑Antón
- Alexander Hammers
- Andrew P. King
Paper Information
- arXiv ID: 2512.18888v1
- Categories: cs.CV
- Published: December 21, 2025
- PDF: Download PDF