[Paper] Uncertainty-Aware Pedestrian Attribute Recognition via Evidential Deep Learning

Published: 5 days ago (April 29, 2026 at 12:41 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.26873v1

Overview

The paper introduces UAPAR, a novel framework that brings uncertainty awareness to pedestrian attribute recognition (PAR). By integrating Evidential Deep Learning (EDL) with a CLIP‑style vision‑language backbone, the system can flag predictions it isn’t confident about—an ability that traditional deterministic models lack, especially when dealing with low‑quality or noisy data.

Key Contributions

First EDL‑based PAR system that quantifies epistemic uncertainty for each attribute.
Region‑Aware Evidence Reasoning (RAER) module: uses cross‑attention and spatial priors to harvest fine‑grained local cues before feeding them to an evidential head.
Uncertainty‑guided dual‑stage curriculum learning: dynamically adjusts the training curriculum to mitigate the impact of noisy labels.
Extensive validation on four large‑scale datasets (PA100K, PETA, RAPv1, RAPv2) showing competitive or state‑of‑the‑art accuracy while also delivering reliable uncertainty estimates.
Qualitative analysis demonstrating that high uncertainty scores correlate with challenging or mis‑predicted samples.

Methodology

Backbone – The model builds on a CLIP‑style architecture (image encoder + text encoder) to obtain a rich joint representation of pedestrian images and attribute semantics.
Region‑Aware Evidence Reasoning (RAER)
- A cross‑attention block aligns image patches with attribute tokens, allowing the network to focus on the most informative regions (e.g., a backpack, shoes).
- Spatial prior masks (derived from human pose or segmentation cues) guide attention toward plausible body parts, improving local feature extraction.
Evidential Head
- Instead of outputting a single softmax probability, the head predicts evidence parameters of a Dirichlet distribution for each attribute.
- From the Dirichlet, both the expected class probability and the epistemic uncertainty (variance) are derived.
Uncertainty‑Guided Curriculum Learning
- Stage 1: Train on “easy” samples (low uncertainty) to establish a solid base.
- Stage 2: Gradually introduce harder, noisier samples, weighting their loss by the model’s current uncertainty estimate. This prevents noisy labels from overwhelming the learning signal.

The overall pipeline remains end‑to‑end trainable, requiring only standard image‑attribute annotations.

Results & Findings

Dataset	mA (Mean Accuracy)	Uncertainty‑aware mA ↑	Comments
PA100K	85.2%	86.1%	Better handling of occluded or low‑resolution pedestrians
PETA	84.7%	85.5%	Uncertainty scores correctly flag mislabeled attributes
RAPv1	88.3%	89.0%	Gains most pronounced on attributes with high intra‑class variance (e.g., “carrying backpack”)
RAPv2	87.9%	88.6%	Qualitative visualizations show high uncertainty on blurred or heavily occluded images

Key takeaways

Accuracy boost: modest but consistent improvements over strong baselines.
Reliability: the epistemic uncertainty correlates strongly (Pearson ≈ 0.73) with prediction errors, enabling downstream systems to discard or re‑process doubtful outputs.
Robustness to label noise: the curriculum learning scheme reduces performance degradation when up to 30% of training labels are corrupted.

Practical Implications

Surveillance & Smart Cities: Operators can prioritize human review for high‑uncertainty detections (e.g., a person wearing a mask that obscures facial features), reducing false alarms.
Autonomous Vehicles: Pedestrian attribute cues (e.g., “carrying a stroller”) influence motion planning; knowing when the attribute estimate is unreliable can trigger fallback strategies.
Retail & Indoor Analytics: Attribute‑based customer profiling (age, gender, accessories) can be made privacy‑aware by refusing to act on uncertain predictions.
Model Debugging: Developers get a built‑in diagnostic tool—high uncertainty highlights data collection gaps (poor lighting, unusual poses) that can be addressed in future dataset curation.
Active Learning: Uncertainty scores can drive sample selection for human annotation, making data‑labeling pipelines more efficient.

Limitations & Future Work

Computational overhead: The cross‑attention and evidential head add ~15% inference latency compared with a vanilla CLIP classifier, which may be a bottleneck for real‑time edge deployments.
Scope of attributes: Experiments focus on binary attributes; extending to multi‑class or continuous traits (e.g., “height”) remains unexplored.
Uncertainty calibration: While epistemic uncertainty is informative, the paper notes occasional over‑confidence on severely corrupted images; better calibration techniques (e.g., temperature scaling) could improve trustworthiness.
Broader modalities: Incorporating temporal cues from video streams or depth sensors could further reduce uncertainty in challenging scenarios.

Overall, UAPAR opens a promising path toward trustworthy pedestrian attribute systems, giving developers the tools to not only predict “what” but also to gauge “how sure” they are about each prediction.

Authors

Zhuofan Lou
Shihang Zhang
Fangle Zhu
Shengjie Ye
Pingyu Wang

Paper Information

arXiv ID: 2604.26873v1
Categories: cs.CV
Published: April 29, 2026
PDF: Download PDF

[Paper] Uncertainty-Aware Pedestrian Attribute Recognition via Evidential Deep Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Posterior Augmented Flow Matching

[Paper] Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

[Paper] Let ViT Speak: Generative Language-Image Pre-training

[Paper] GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer