[Paper] CAOS: Conformal Aggregation of One-Shot Predictors

Published: 1 month ago (January 8, 2026 at 01:44 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2601.05219v1

Overview

One‑shot prediction lets you fine‑tune a massive pre‑trained model to a brand‑new task with just a single labeled example. While this is a huge win for rapid prototyping, it leaves developers without reliable uncertainty estimates—something that’s crucial when decisions have downstream costs. The paper CAOS: Conformal Aggregation of One‑Shot Predictors introduces a new conformal inference framework that fills this gap, delivering statistically sound prediction sets even when you only have that one labeled datum.

Key Contributions

CAOS framework: A novel conformal method that aggregates multiple one‑shot predictors instead of relying on a single model.
Leave‑one‑out calibration: A clever calibration scheme that makes the most of the single labeled example, avoiding the data‑waste of traditional split‑conformal approaches.
Theoretical guarantee: Proven marginal coverage under a monotonicity argument, despite breaking the usual exchangeability assumptions.
Empirical validation: Demonstrated on one‑shot facial landmark detection and RAFT text classification, showing tighter (smaller) prediction sets than standard baselines while preserving the promised coverage level.

Methodology

Generate a pool of one‑shot predictors – Starting from a frozen foundation model, the authors train several lightweight adapters, each using the same single labeled example but with different random seeds, data augmentations, or hyper‑parameter tweaks.
Aggregate predictions – For a new input, each adapter produces a point prediction (e.g., a set of facial landmarks). CAOS combines these predictions into a score that reflects how far a candidate output deviates from the ensemble.
Leave‑one‑out calibration – The single labeled example is temporarily treated as a “test” point while the remaining adapters are used to compute calibration scores. This process is repeated for each adapter, yielding a full set of calibration quantiles without discarding any data.
Construct prediction sets – Using the calibrated quantile, CAOS builds a set of outputs that, with high probability (e.g., 90 %), contains the true answer. The construction respects the monotonicity of the aggregation score, which is the key to the coverage proof.

Results & Findings

Task	Baseline (Split‑Conformal)	CAOS	Reduction in Set Size
One‑shot facial landmarking (5‑point)	95 % coverage, avg. set radius 4.2 px	95 % coverage, avg. radius 2.8 px	≈33 % smaller
RAFT text classification (sentiment)	90 % coverage, avg. set cardinality 3.1	90 % coverage, avg. cardinality 2.2	≈29 % smaller

Coverage stays at the nominal level (90–95 %) across all experiments, confirming the theoretical guarantee.
Prediction sets are consistently tighter, meaning developers get more informative uncertainty bounds without sacrificing reliability.

Practical Implications

Faster product iteration – Teams can deploy one‑shot fine‑tuned models with built‑in confidence intervals, reducing the need for costly data collection before launch.
Safety‑critical systems – In domains like medical imaging or autonomous driving, CAOS‑derived sets can flag when a one‑shot model’s prediction is too ambiguous, prompting human review.
Model‑agnostic tooling – Because CAOS works with any foundation model that can be adapted in a one‑shot fashion, it can be packaged as a plug‑in for popular ML libraries (e.g., Hugging Face Transformers, PyTorch Lightning).
Resource efficiency – The leave‑one‑out calibration eliminates the need to reserve a validation split, saving precious labeled data and compute time.

Limitations & Future Work

Scalability of the predictor pool – Generating many one‑shot adapters incurs extra compute; the paper explores modest pool sizes (5–10) but larger ensembles may be needed for very complex tasks.
Assumption of monotonicity – The coverage proof hinges on a monotonic aggregation score, which may not hold for all types of predictors (e.g., highly non‑linear output spaces).
Domain‑specific calibration – While the leave‑one‑out scheme works well for the studied tasks, extending CAOS to structured outputs (e.g., full segmentation maps) may require custom score functions.
Future directions include adaptive pool sizing, integration with active learning loops to acquire additional labels when uncertainty remains high, and broader benchmarks across vision, speech, and reinforcement‑learning settings.

Authors

Maja Waldron

Paper Information

arXiv ID: 2601.05219v1
Categories: stat.ML, cs.AI, cs.LG
Published: January 8, 2026
PDF: Download PDF

[Paper] CAOS: Conformal Aggregation of One-Shot Predictors

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Manifold limit for the training of shallow graph convolutional neural networks

[Paper] AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

[Paper] LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection

[Paper] Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem