[Paper] Explicit and Non-asymptotic Query Complexities of Rank-Based Zeroth-order Algorithms on Smooth Functions
Source: arXiv - 2512.16200v1
Overview
This paper delivers the first explicit, non‑asymptotic bounds on how many function queries a rank‑based zeroth‑order (ZO) optimizer needs to reach a target accuracy. By focusing on the simple “top‑k” selection rule that underlies popular tools such as CMA‑ES, NES, and rank‑based genetic algorithms, the author shows that these heuristics are not just empirically robust—they also enjoy provable efficiency on smooth (convex or non‑convex) objectives.
Key Contributions
- Explicit query‑complexity formulas for a rank‑based ZO algorithm, removing the “asymptotic only” limitation of prior work.
- Strong convex case: (\widetilde{\mathcal O}!\big(\frac{dL}{\mu}\log\frac{dL}{\mu\delta}\log\frac{1}{\varepsilon}\big)) queries to achieve an (\varepsilon)‑suboptimal solution.
- Non‑convex smooth case: (\mathcal O!\big(\frac{dL}{\varepsilon}\log\frac{1}{\varepsilon}\big)) queries to reach an (\varepsilon)‑stationary point.
- High‑probability guarantees (probability ≥ (1-\delta)) rather than expectations, which is more useful for safety‑critical or production settings.
- Novel analysis technique that sidesteps traditional drift‑analysis and information‑geometric arguments, offering a fresh perspective on why rank‑based heuristics work.
Methodology
- Algorithmic skeleton – The studied method samples a set of random search directions, evaluates the objective at perturbed points, ranks the outcomes, and keeps the top‑(k) directions to update the current estimate. This mirrors the “selection” step in CMA‑ES and NES.
- Smoothness assumptions – The function is assumed (L)-smooth (gradients don’t change too fast). For the convex case, strong convexity with parameter (\mu) is also required.
- Probabilistic analysis – By bounding the probability that the ranking correctly identifies descent directions, the author derives concentration inequalities for the estimated gradient surrogate.
- Recursive error reduction – The analysis shows how each iteration shrinks the optimality gap by a factor that depends on (d), (L), (\mu), and the chosen (k).
- Stopping criterion – The number of iterations needed to drive the gap below (\varepsilon) is then summed, yielding the explicit query‑complexity formulas above.
The key insight is that only the order of function values is needed to construct a reliable gradient estimate, and that the randomness in direction sampling provides enough information to guarantee progress with high probability.
Results & Findings
| Setting | Query Complexity (to reach (\varepsilon) accuracy) | Interpretation |
|---|---|---|
| Strongly convex, (L)-smooth | (\widetilde{\mathcal O}!\big(\frac{dL}{\mu}\log\frac{dL}{\mu\delta}\log\frac{1}{\varepsilon}\big)) | Linear dependence on dimension (d) and condition number (L/\mu); double‑logarithmic dependence on (\varepsilon) (very mild). |
| Smooth non‑convex | (\mathcal O!\big(\frac{dL}{\varepsilon}\log\frac{1}{\varepsilon}\big)) | Same linear‑in‑(d) scaling; the (\frac{1}{\varepsilon}) term matches the best known rates for classic (gradient‑based) ZO methods, but here we only use rankings. |
| Success probability | ≥ (1-\delta) (any (0<\delta<1)) | Guarantees hold with high confidence, not just in expectation. |
These bounds demonstrate that rank‑based ZO methods are theoretically competitive with gradient‑based ZO algorithms, while offering extra robustness to noise and monotone transformations.
Practical Implications
- Robust hyperparameter tuning & black‑box optimization – Engineers can rely on CMA‑ES‑style optimizers with a concrete expectation of how many evaluations are needed, even when the objective is noisy or only ordinal feedback is available (e.g., A/B test click‑through rates).
- Resource budgeting – The explicit formulas let practitioners estimate compute budgets (GPU hours, cloud credits) before launching large‑scale searches, improving project planning.
- Safety‑critical domains – High‑probability guarantees are valuable for aerospace, finance, or medical AI, where worst‑case performance matters more than average case.
- Algorithm design – The analysis suggests that increasing the top‑(k) selection size can trade off per‑iteration cost vs. convergence speed, giving a tunable knob for developers.
- Integration with existing libraries – Since the algorithmic core is identical to what’s already implemented in libraries like
cmaes,nevergrad, orpycma, the theoretical results can be added as documentation or as optional “theory‑guided” stopping criteria.
Limitations & Future Work
- Assumption of smoothness – Real‑world black‑box functions may be non‑smooth or have discontinuities; extending the analysis to such settings remains open.
- Strong convexity requirement for the fast (\log(1/\varepsilon)) rate; many practical problems are only locally convex or merely satisfy the Polyak‑Łojasiewicz condition.
- Fixed top‑(k) strategy – The paper studies a static (k); adaptive schemes (e.g., increasing (k) as the search progresses) could improve practical performance but are not covered.
- Empirical validation – While the theoretical bounds are tight, the paper does not include large‑scale experiments; future work could benchmark the predicted query counts against real optimization runs.
Overall, the work bridges a critical gap between the empirical success of rank‑based zeroth‑order methods and their theoretical underpinnings, giving developers a solid foundation to trust—and further improve—these algorithms in production environments.
Authors
- Haishan Ye
Paper Information
- arXiv ID: 2512.16200v1
- Categories: cs.LG, cs.NE
- Published: December 18, 2025
- PDF: Download PDF