[Paper] Vendi Novelty Scores for Out-of-Distribution Detection
Source: arXiv - 2602.10062v1
Overview
Out‑of‑distribution (OOD) detection flags inputs that differ from the data a model was trained on—a safety net for any production‑grade AI system. This paper introduces Vendi Novelty Scores (VNS), a new OOD detector that treats novelty as a diversity problem rather than relying on confidence or likelihood estimates. By measuring how much a test sample would increase the diversity of the in‑distribution feature set, VNS delivers state‑of‑the‑art detection while remaining simple, fast, and memory‑light.
Key Contributions
- Diversity‑based OOD detection: Formulates OOD detection as a question of how a new sample changes the Vendi Score (a similarity‑based diversity metric).
- Vendi Novelty Score (VNS): A non‑parametric, linear‑time algorithm that combines local (class‑conditional) and global (dataset‑wide) novelty cues without any density modeling.
- Scalable to tiny reference sets: Shows that VNS retains top performance even when built from just 1 % of the training data, enabling use on edge devices or in privacy‑restricted environments.
- Broad empirical validation: Beats existing post‑hoc OOD detectors across several image classification benchmarks (CIFAR‑10/100, ImageNet, etc.) and multiple network architectures (ResNet, DenseNet, Vision Transformers).
Methodology
- Feature extraction: Pass all in‑distribution training samples through the frozen backbone of the target model and collect their latent representations (e.g., the penultimate layer).
- Compute Vendi Score (VS): VS is a kernel‑based diversity measure that aggregates pairwise similarities among a set of vectors. Intuitively, a set with many similar points gets a low VS, while a set that spreads out in feature space gets a high VS.
- Novelty estimation: For a test sample (x), compute the VS of the original in‑distribution set plus the feature of (x). The Vendi Novelty Score is the increase in VS caused by adding (x).
[ \text{VNS}(x) = \text{VS}(\mathcal{F}\cup{f(x)}) - \text{VS}(\mathcal{F}) ]
where (\mathcal{F}) denotes the stored feature set and (f(\cdot)) the feature extractor.
- Local vs. global signals: VNS can be computed on the whole feature set (global) or separately per class (local). The final score is a weighted blend, allowing the detector to capture both “this sample looks unlike any class” and “this sample is far from the overall data manifold.”
- Decision rule: A simple threshold on VNS separates in‑distribution from OOD inputs; the threshold can be set using a small validation split.
Because VS is computed with a kernel that can be evaluated in constant time per pair, the overall cost scales linearly with the size of (\mathcal{F}). No extra training, density estimation, or gradient‑based scoring is required.
Results & Findings
| Dataset / Backbone | AUROC (previous best) | AUROC (VNS) | Memory used for reference set |
|---|---|---|---|
| CIFAR‑10 / ResNet‑34 | 96.2 % | 98.1 % | 1 % of training images |
| CIFAR‑100 / DenseNet | 93.5 % | 96.8 % | 1 % of training images |
| ImageNet‑O / ViT‑B/16 | 89.4 % | 92.3 % | 1 % of training images |
Key takeaways
- State‑of‑the‑art detection: VNS consistently outperforms confidence‑based (e.g., Maximum Softmax Probability) and likelihood‑based (e.g., Mahalanobis) detectors.
- Robust to reduced reference data: Even with a 99 % reduction in stored features, VNS loses less than 1 % AUROC, demonstrating that diversity can be captured with a tiny sketch of the training distribution.
- Fast inference: Linear‑time scoring translates to sub‑millisecond latency per image on a CPU, making it practical for real‑time systems.
Practical Implications
- Edge and IoT deployments: Because VNS needs only a handful of stored feature vectors, it fits on devices with strict memory budgets (e.g., smartphones, drones, embedded cameras).
- Zero‑training OOD guardrails: Teams can add an OOD detector to an existing model without retraining or fine‑tuning—just extract a small feature bank once and plug in the VNS routine.
- Safety‑critical pipelines: In autonomous driving, medical imaging, or fraud detection, VNS can act as a lightweight “novelty alarm” that flags out‑of‑distribution inputs before downstream decisions are made.
- Privacy‑preserving scenarios: Since the reference set can be a tiny, possibly anonymized subset of the original data, VNS aligns with regulations that limit data retention.
Limitations & Future Work
- Dependence on feature quality: VNS inherits the representational power of the underlying model; if the backbone poorly separates classes, VNS may struggle.
- Kernel choice sensitivity: While the authors use a Gaussian kernel, selecting bandwidths for high‑dimensional features can affect performance and may need domain‑specific tuning.
- Extension beyond vision: The paper focuses on image classification; applying VNS to text, speech, or multimodal data will require investigating appropriate feature extractors and similarity measures.
- Theoretical guarantees: Future work could formalize bounds on detection error as a function of reference set size and kernel parameters, providing stronger assurances for safety‑critical deployments.
Authors
- Amey P. Pasarkar
- Adji Bousso Dieng
Paper Information
- arXiv ID: 2602.10062v1
- Categories: cs.LG, cs.CV
- Published: February 10, 2026
- PDF: Download PDF