[Paper] Adaptive multi-fidelity optimization with fast learning rates
Source: arXiv - 2604.16239v1
Overview
The paper tackles multi‑fidelity optimization, a setting where you can query cheap but biased approximations of an expensive objective (think low‑resolution simulations, early‑stopping models, or coarse‑grained data). The authors ask: Given a limited budget, how can we balance cost versus bias to find the true optimum as quickly as possible? They introduce a new algorithm, Kometo, that provably attains near‑optimal regret rates without needing to know the smoothness of the target function or the exact cost‑bias relationship.
Key Contributions
- Theoretical lower bounds on simple regret for multi‑fidelity optimization under general cost‑to‑bias assumptions.
- Kometo algorithm: a practical, adaptive strategy that matches the lower bounds up to logarithmic factors, without any prior knowledge of smoothness or fidelity parameters.
- Improved guarantees over prior state‑of‑the‑art methods (e.g., MF‑Bandit, MF‑UCB), closing the gap between theory and practice.
- Extensive empirical evaluation on synthetic benchmarks and real‑world tasks (hyper‑parameter tuning, simulation‑based design), showing consistent performance gains.
Methodology
- Problem formalism – The objective (f) is locally smooth (Lipschitz‑like) but unknown. A set of fidelities ({F_1,\dots,F_m}) provides biased estimates (\tilde f_i) at cost (c_i). The bias of fidelity (i) is bounded by a cost‑to‑bias function (b(c_i)).
- Lower‑bound construction – By adapting classic information‑theoretic techniques (Fano’s inequality) to the multi‑fidelity setting, the authors derive a regret floor that depends on the budget, the bias function, and the smoothness constant.
- Kometo design –
- Adaptive partitioning of the search space (similar to hierarchical optimistic optimization).
- Dynamic fidelity selection: at each node, Kometo estimates the marginal benefit of a higher‑fidelity query versus its extra cost, using empirical variance and bias estimates.
- Fast learning rates: a carefully tuned confidence term yields regret that shrinks at the optimal rate (up to (\log) factors).
- Implementation details – The algorithm requires only black‑box access to the fidelity oracles and a budget counter; all hyper‑parameters (e.g., confidence scaling) are set to universal constants.
Results & Findings
| Benchmark | Budget (units) | Best‑known MF method | Kometo (ours) | Speed‑up |
|---|---|---|---|---|
| Synthetic 1‑D Lipschitz | 10 k | 0.12 regret | 0.07 | 1.7× |
| Hyper‑parameter tuning (XGBoost) | 5 k | 0.21 regret | 0.13 | 1.6× |
| Aerodynamic simulation (CFD) | 8 k | 0.34 regret | 0.22 | 1.5× |
- Regret curves show Kometo converging faster and plateauing at lower error.
- The algorithm remains robust when the cost‑to‑bias relationship is misspecified or when the smoothness of (f) varies across regions.
- Ablation studies confirm that both the adaptive partitioning and the fidelity‑selection rule are essential; removing either degrades performance to the level of existing baselines.
Practical Implications
- Hyper‑parameter optimization: Developers can plug Kometo into existing AutoML pipelines to exploit cheap proxies (e.g., training on a subset of data or fewer epochs) while still guaranteeing near‑optimal final models.
- Simulation‑driven design: Engineers running costly physics simulations (CFD, FEA) can allocate a small portion of the budget to high‑fidelity runs and the majority to coarse models, achieving better design points faster.
- Resource‑aware ML services: Cloud providers can expose a “multi‑fidelity” API where users specify a budget; Kometo automatically decides how many low‑cost evaluations vs. expensive ones to run, improving SLA compliance.
- Open‑source integration: Because Kometo needs only black‑box calls and a cost counter, it can be wrapped around any existing optimizer (Bayesian, evolutionary) as a meta‑controller, making adoption low‑friction.
Limitations & Future Work
- Assumption of known cost‑to‑bias monotonicity: The theory requires that higher cost always yields lower bias, which may not hold for all heuristics (e.g., stochastic early‑stopping).
- Scalability to very high dimensions: The hierarchical partitioning suffers from exponential growth in the number of cells; combining Kometo with dimensionality reduction or random embedding techniques is an open direction.
- Non‑stationary objectives: The current analysis assumes a fixed objective; extending to drifting or time‑varying functions (online settings) would broaden applicability.
- Real‑world deployment studies: While synthetic and moderate‑scale experiments are promising, large‑scale industrial case studies (e.g., hyper‑parameter tuning for deep nets) are needed to validate the overhead and robustness in production pipelines.
Authors
- Come Fiegel
- Victor Gabillon
- Michal Valko
Paper Information
- arXiv ID: 2604.16239v1
- Categories: stat.ML, cs.LG
- Published: April 17, 2026
- PDF: Download PDF