[Paper] Online Pandora's Box for Contextual LLM Cascading
Source: arXiv - 2606.07392v1
Overview
Motivated by Large Language Model (LLM) cascading, we propose an online contextual Pandora’s Box model for adaptively querying and selecting LLM APIs. In each period, a decision-maker observes a request context and faces a two-phase decision problem. In the query phase, the decision-maker sequentially queries APIs, where each query reveals a generated output and the decision-maker incurs an (output-dependent) cost. In the selection phase, the decision-maker selects one of the generated outputs to deploy and observes only the downstream reward of the deployed output. This output-mediated feedback structure differs from classical online contextual Pandora’s Box models, in which opening a box directly reveals its reward. Rather than estimating the full conditional output and cost distributions of each API, we directly model the reservation index and develop a learning approach for the query phase. Specifically, we impose a parametric structure on the contextual reservation index functions induced by the classical Weitzman’s policy. Our policy combines generalized method of moments (GMM) type estimation of these reservation indices with UCB-style confidence bounds for both these indices and the shared output-level reward evaluator. Under regularity conditions, we prove that the resulting policy achieves dimension-dependent $\widetilde O(\sqrt T)$ cumulative regret over a horizon of $T$ periods.
Key Contributions
This paper presents research in the following areas:
- cs.AI
- cs.LG
- econ.EM
- stat.ML
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.AI.
Authors
- Alexandre Belloni
- Yan Chen
- Yehua Wei
Paper Information
- arXiv ID: 2606.07392v1
- Categories: cs.AI, cs.LG, econ.EM, stat.ML
- Published: June 5, 2026
- PDF: Download PDF