[Paper] Vendi Novelty Scores for Out-of-Distribution Detection

Published: (February 10, 2026 at 01:30 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.10062v1

Overview

Out‑of‑distribution (OOD) detection flags inputs that differ from the data a model was trained on—a safety net for any production‑grade AI system. This paper introduces Vendi Novelty Scores (VNS), a new OOD detector that treats novelty as a diversity problem rather than relying on confidence or likelihood estimates. By measuring how much a test sample would increase the diversity of the in‑distribution feature set, VNS delivers state‑of‑the‑art detection while remaining simple, fast, and memory‑light.

Key Contributions

  • Diversity‑based OOD detection: Formulates OOD detection as a question of how a new sample changes the Vendi Score (a similarity‑based diversity metric).
  • Vendi Novelty Score (VNS): A non‑parametric, linear‑time algorithm that combines local (class‑conditional) and global (dataset‑wide) novelty cues without any density modeling.
  • Scalable to tiny reference sets: Shows that VNS retains top performance even when built from just 1 % of the training data, enabling use on edge devices or in privacy‑restricted environments.
  • Broad empirical validation: Beats existing post‑hoc OOD detectors across several image classification benchmarks (CIFAR‑10/100, ImageNet, etc.) and multiple network architectures (ResNet, DenseNet, Vision Transformers).

Methodology

  1. Feature extraction: Pass all in‑distribution training samples through the frozen backbone of the target model and collect their latent representations (e.g., the penultimate layer).
  2. Compute Vendi Score (VS): VS is a kernel‑based diversity measure that aggregates pairwise similarities among a set of vectors. Intuitively, a set with many similar points gets a low VS, while a set that spreads out in feature space gets a high VS.
  3. Novelty estimation: For a test sample (x), compute the VS of the original in‑distribution set plus the feature of (x). The Vendi Novelty Score is the increase in VS caused by adding (x).

[ \text{VNS}(x) = \text{VS}(\mathcal{F}\cup{f(x)}) - \text{VS}(\mathcal{F}) ]

where (\mathcal{F}) denotes the stored feature set and (f(\cdot)) the feature extractor.

  1. Local vs. global signals: VNS can be computed on the whole feature set (global) or separately per class (local). The final score is a weighted blend, allowing the detector to capture both “this sample looks unlike any class” and “this sample is far from the overall data manifold.”
  2. Decision rule: A simple threshold on VNS separates in‑distribution from OOD inputs; the threshold can be set using a small validation split.

Because VS is computed with a kernel that can be evaluated in constant time per pair, the overall cost scales linearly with the size of (\mathcal{F}). No extra training, density estimation, or gradient‑based scoring is required.

Results & Findings

Dataset / BackboneAUROC (previous best)AUROC (VNS)Memory used for reference set
CIFAR‑10 / ResNet‑3496.2 %98.1 %1 % of training images
CIFAR‑100 / DenseNet93.5 %96.8 %1 % of training images
ImageNet‑O / ViT‑B/1689.4 %92.3 %1 % of training images

Key takeaways

  • State‑of‑the‑art detection: VNS consistently outperforms confidence‑based (e.g., Maximum Softmax Probability) and likelihood‑based (e.g., Mahalanobis) detectors.
  • Robust to reduced reference data: Even with a 99 % reduction in stored features, VNS loses less than 1 % AUROC, demonstrating that diversity can be captured with a tiny sketch of the training distribution.
  • Fast inference: Linear‑time scoring translates to sub‑millisecond latency per image on a CPU, making it practical for real‑time systems.

Practical Implications

  • Edge and IoT deployments: Because VNS needs only a handful of stored feature vectors, it fits on devices with strict memory budgets (e.g., smartphones, drones, embedded cameras).
  • Zero‑training OOD guardrails: Teams can add an OOD detector to an existing model without retraining or fine‑tuning—just extract a small feature bank once and plug in the VNS routine.
  • Safety‑critical pipelines: In autonomous driving, medical imaging, or fraud detection, VNS can act as a lightweight “novelty alarm” that flags out‑of‑distribution inputs before downstream decisions are made.
  • Privacy‑preserving scenarios: Since the reference set can be a tiny, possibly anonymized subset of the original data, VNS aligns with regulations that limit data retention.

Limitations & Future Work

  • Dependence on feature quality: VNS inherits the representational power of the underlying model; if the backbone poorly separates classes, VNS may struggle.
  • Kernel choice sensitivity: While the authors use a Gaussian kernel, selecting bandwidths for high‑dimensional features can affect performance and may need domain‑specific tuning.
  • Extension beyond vision: The paper focuses on image classification; applying VNS to text, speech, or multimodal data will require investigating appropriate feature extractors and similarity measures.
  • Theoretical guarantees: Future work could formalize bounds on detection error as a function of reference set size and kernel parameters, providing stronger assurances for safety‑critical deployments.

Authors

  • Amey P. Pasarkar
  • Adji Bousso Dieng

Paper Information

  • arXiv ID: 2602.10062v1
  • Categories: cs.LG, cs.CV
  • Published: February 10, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »