[Paper] CAOS: Conformal Aggregation of One-Shot Predictors

Published: (January 8, 2026 at 01:44 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.05219v1

Overview

One‑shot prediction lets you fine‑tune a massive pre‑trained model to a brand‑new task with just a single labeled example. While this is a huge win for rapid prototyping, it leaves developers without reliable uncertainty estimates—something that’s crucial when decisions have downstream costs. The paper CAOS: Conformal Aggregation of One‑Shot Predictors introduces a new conformal inference framework that fills this gap, delivering statistically sound prediction sets even when you only have that one labeled datum.

Key Contributions

  • CAOS framework: A novel conformal method that aggregates multiple one‑shot predictors instead of relying on a single model.
  • Leave‑one‑out calibration: A clever calibration scheme that makes the most of the single labeled example, avoiding the data‑waste of traditional split‑conformal approaches.
  • Theoretical guarantee: Proven marginal coverage under a monotonicity argument, despite breaking the usual exchangeability assumptions.
  • Empirical validation: Demonstrated on one‑shot facial landmark detection and RAFT text classification, showing tighter (smaller) prediction sets than standard baselines while preserving the promised coverage level.

Methodology

  1. Generate a pool of one‑shot predictors – Starting from a frozen foundation model, the authors train several lightweight adapters, each using the same single labeled example but with different random seeds, data augmentations, or hyper‑parameter tweaks.
  2. Aggregate predictions – For a new input, each adapter produces a point prediction (e.g., a set of facial landmarks). CAOS combines these predictions into a score that reflects how far a candidate output deviates from the ensemble.
  3. Leave‑one‑out calibration – The single labeled example is temporarily treated as a “test” point while the remaining adapters are used to compute calibration scores. This process is repeated for each adapter, yielding a full set of calibration quantiles without discarding any data.
  4. Construct prediction sets – Using the calibrated quantile, CAOS builds a set of outputs that, with high probability (e.g., 90 %), contains the true answer. The construction respects the monotonicity of the aggregation score, which is the key to the coverage proof.

Results & Findings

TaskBaseline (Split‑Conformal)CAOSReduction in Set Size
One‑shot facial landmarking (5‑point)95 % coverage, avg. set radius 4.2 px95 % coverage, avg. radius 2.8 px≈33 % smaller
RAFT text classification (sentiment)90 % coverage, avg. set cardinality 3.190 % coverage, avg. cardinality 2.2≈29 % smaller
  • Coverage stays at the nominal level (90–95 %) across all experiments, confirming the theoretical guarantee.
  • Prediction sets are consistently tighter, meaning developers get more informative uncertainty bounds without sacrificing reliability.

Practical Implications

  • Faster product iteration – Teams can deploy one‑shot fine‑tuned models with built‑in confidence intervals, reducing the need for costly data collection before launch.
  • Safety‑critical systems – In domains like medical imaging or autonomous driving, CAOS‑derived sets can flag when a one‑shot model’s prediction is too ambiguous, prompting human review.
  • Model‑agnostic tooling – Because CAOS works with any foundation model that can be adapted in a one‑shot fashion, it can be packaged as a plug‑in for popular ML libraries (e.g., Hugging Face Transformers, PyTorch Lightning).
  • Resource efficiency – The leave‑one‑out calibration eliminates the need to reserve a validation split, saving precious labeled data and compute time.

Limitations & Future Work

  • Scalability of the predictor pool – Generating many one‑shot adapters incurs extra compute; the paper explores modest pool sizes (5–10) but larger ensembles may be needed for very complex tasks.
  • Assumption of monotonicity – The coverage proof hinges on a monotonic aggregation score, which may not hold for all types of predictors (e.g., highly non‑linear output spaces).
  • Domain‑specific calibration – While the leave‑one‑out scheme works well for the studied tasks, extending CAOS to structured outputs (e.g., full segmentation maps) may require custom score functions.
  • Future directions include adaptive pool sizing, integration with active learning loops to acquire additional labels when uncertainty remains high, and broader benchmarks across vision, speech, and reinforcement‑learning settings.

Authors

  • Maja Waldron

Paper Information

  • arXiv ID: 2601.05219v1
  • Categories: stat.ML, cs.AI, cs.LG
  • Published: January 8, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »