[Paper] Vendi Novelty Scores for Out-of-Distribution Detection

Published: 2 days ago (February 10, 2026 at 01:30 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.10062v1

Overview

Out‑of‑distribution (OOD) detection flags inputs that differ from the data a model was trained on—a safety net for any production‑grade AI system. This paper introduces Vendi Novelty Scores (VNS), a new OOD detector that treats novelty as a diversity problem rather than relying on confidence or likelihood estimates. By measuring how much a test sample would increase the diversity of the in‑distribution feature set, VNS delivers state‑of‑the‑art detection while remaining simple, fast, and memory‑light.

Key Contributions

Diversity‑based OOD detection: Formulates OOD detection as a question of how a new sample changes the Vendi Score (a similarity‑based diversity metric).
Vendi Novelty Score (VNS): A non‑parametric, linear‑time algorithm that combines local (class‑conditional) and global (dataset‑wide) novelty cues without any density modeling.
Scalable to tiny reference sets: Shows that VNS retains top performance even when built from just 1 % of the training data, enabling use on edge devices or in privacy‑restricted environments.
Broad empirical validation: Beats existing post‑hoc OOD detectors across several image classification benchmarks (CIFAR‑10/100, ImageNet, etc.) and multiple network architectures (ResNet, DenseNet, Vision Transformers).

Methodology

Feature extraction: Pass all in‑distribution training samples through the frozen backbone of the target model and collect their latent representations (e.g., the penultimate layer).
Compute Vendi Score (VS): VS is a kernel‑based diversity measure that aggregates pairwise similarities among a set of vectors. Intuitively, a set with many similar points gets a low VS, while a set that spreads out in feature space gets a high VS.
Novelty estimation: For a test sample (x), compute the VS of the original in‑distribution set plus the feature of (x). The Vendi Novelty Score is the increase in VS caused by adding (x).

[ \text{VNS}(x) = \text{VS}(\mathcal{F}\cup{f(x)}) - \text{VS}(\mathcal{F}) ]

where (\mathcal{F}) denotes the stored feature set and (f(\cdot)) the feature extractor.

Local vs. global signals: VNS can be computed on the whole feature set (global) or separately per class (local). The final score is a weighted blend, allowing the detector to capture both “this sample looks unlike any class” and “this sample is far from the overall data manifold.”
Decision rule: A simple threshold on VNS separates in‑distribution from OOD inputs; the threshold can be set using a small validation split.

Because VS is computed with a kernel that can be evaluated in constant time per pair, the overall cost scales linearly with the size of (\mathcal{F}). No extra training, density estimation, or gradient‑based scoring is required.

Results & Findings

Dataset / Backbone	AUROC (previous best)	AUROC (VNS)	Memory used for reference set
CIFAR‑10 / ResNet‑34	96.2 %	98.1 %	1 % of training images
CIFAR‑100 / DenseNet	93.5 %	96.8 %	1 % of training images
ImageNet‑O / ViT‑B/16	89.4 %	92.3 %	1 % of training images

Key takeaways

State‑of‑the‑art detection: VNS consistently outperforms confidence‑based (e.g., Maximum Softmax Probability) and likelihood‑based (e.g., Mahalanobis) detectors.
Robust to reduced reference data: Even with a 99 % reduction in stored features, VNS loses less than 1 % AUROC, demonstrating that diversity can be captured with a tiny sketch of the training distribution.
Fast inference: Linear‑time scoring translates to sub‑millisecond latency per image on a CPU, making it practical for real‑time systems.

Practical Implications

Edge and IoT deployments: Because VNS needs only a handful of stored feature vectors, it fits on devices with strict memory budgets (e.g., smartphones, drones, embedded cameras).
Zero‑training OOD guardrails: Teams can add an OOD detector to an existing model without retraining or fine‑tuning—just extract a small feature bank once and plug in the VNS routine.
Safety‑critical pipelines: In autonomous driving, medical imaging, or fraud detection, VNS can act as a lightweight “novelty alarm” that flags out‑of‑distribution inputs before downstream decisions are made.
Privacy‑preserving scenarios: Since the reference set can be a tiny, possibly anonymized subset of the original data, VNS aligns with regulations that limit data retention.

Limitations & Future Work

Dependence on feature quality: VNS inherits the representational power of the underlying model; if the backbone poorly separates classes, VNS may struggle.
Kernel choice sensitivity: While the authors use a Gaussian kernel, selecting bandwidths for high‑dimensional features can affect performance and may need domain‑specific tuning.
Extension beyond vision: The paper focuses on image classification; applying VNS to text, speech, or multimodal data will require investigating appropriate feature extractors and similarity measures.
Theoretical guarantees: Future work could formalize bounds on detection error as a function of reference set size and kernel parameters, providing stronger assurances for safety‑critical deployments.

Authors

Amey P. Pasarkar
Adji Bousso Dieng

Paper Information

arXiv ID: 2602.10062v1
Categories: cs.LG, cs.CV
Published: February 10, 2026
PDF: Download PDF

[Paper] Vendi Novelty Scores for Out-of-Distribution Detection

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] MonarchRT: Efficient Attention for Real-Time Video Generation

[Paper] Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision

[Paper] Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training