[Paper] SMOG: Scalable Meta-Learning for Multi-Objective Bayesian Optimization

Published: (January 29, 2026 at 01:51 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.22131v1

Overview

The paper introduces SMOG, a new meta‑learning framework that equips multi‑objective Bayesian optimization (MOBO) with a scalable, data‑driven prior. By leveraging historical data from related optimization problems, SMOG can “warm‑start” the search for Pareto‑optimal solutions, dramatically cutting the number of expensive black‑box evaluations needed in real‑world engineering and ML pipelines.

Key Contributions

  • Unified meta‑learning + MOBO model – First method that learns a joint Gaussian‑process (GP) prior across many past tasks and multiple objectives simultaneously.
  • Correlation‑aware multi‑output GP – Explicitly captures statistical dependencies between objectives, improving surrogate fidelity on the target problem.
  • Closed‑form target prior with residual kernel – After conditioning on task metadata, SMOG produces an analytically tractable prior plus a flexible residual kernel that adapts to the new task.
  • Scalable hierarchical training – Meta‑task GPs are trained once, cached, and reused, giving linear time‑complexity in the number of meta‑tasks.
  • Plug‑and‑play with existing MOBO acquisition functions – No custom acquisition is required; SMOG’s surrogate can be dropped into standard tools such as Expected Hypervolume Improvement (EHVI).

Methodology

  1. Meta‑task collection – Gather a set of related optimization problems (e.g., tuning hyper‑parameters for different datasets). Each meta‑task provides a small set of input‑output pairs for all objectives.
  2. Multi‑output GP construction – Build a joint GP that models all objectives together, using a kernel that factorises into:
    • A metadata kernel that ties together tasks sharing similar descriptors (e.g., dataset size, hardware specs).
    • A residual multi‑output kernel that captures task‑specific nuances not explained by metadata.
  3. Conditioning on metadata – When a new target task arrives, its metadata is plugged into the GP. The model analytically integrates out uncertainty over the metadata, yielding a closed‑form prior for the target surrogate.
  4. Hierarchical training
    • Stage 1: Fit independent GPs for each meta‑task (parallelizable).
    • Stage 2: Learn the hyper‑parameters of the metadata and residual kernels jointly, using the cached stage‑1 posteriors. This step scales linearly with the number of meta‑tasks.
  5. Optimization loop – Use the resulting surrogate inside any standard MOBO acquisition function (e.g., EHVI, Pareto‑frontier entropy). The acquisition selects the next black‑box evaluation, the data are added to the surrogate, and the loop repeats.

Results & Findings

ExperimentBaselineSMOG (meta‑learned)Speed‑up
Synthetic 2‑objective benchmark (30 meta‑tasks)Standard MOBO (no prior)SMOG‑augmented MOBO~2.5× fewer evaluations to reach 90 % hypervolume
Hyper‑parameter tuning of a multi‑objective NN (accuracy vs. latency) across 10 datasetsRandom search + MOBOSMOG‑MOBO40 % reduction in total GPU hours
Real‑world engineering design (weight vs. strength) with 5 historic designsEvolutionary MOEASMOG‑MOBOConverged to Pareto front in half the budget

Key take‑aways

  • Meta‑learning the prior consistently reduces the number of expensive evaluations needed to approximate the Pareto front.
  • The correlation‑aware kernel improves surrogate accuracy, especially when objectives are strongly coupled (e.g., accuracy vs. latency).
  • Training time grows linearly with the number of meta‑tasks, confirming the claimed scalability.

Practical Implications

  • Faster hyper‑parameter sweeps for multi‑objective ML models (e.g., balancing accuracy, inference time, and memory).
  • Accelerated engineering design cycles where simulations are costly (CFD, structural analysis) and multiple performance metrics must be optimized.
  • Continuous improvement pipelines: as new tasks are solved, their data automatically enrich the meta‑learning pool, making future optimizations progressively cheaper.
  • Easy integration: Since SMOG outputs a standard GP posterior, existing BO libraries (BoTorch, GPyOpt, Emukit) can consume it without code changes.

Limitations & Future Work

  • Metadata quality dependence – The approach assumes informative, low‑dimensional descriptors for each task; poor metadata can degrade the prior.
  • Gaussian‑process scalability – Although meta‑training is linear, each GP still incurs cubic cost in its own data size; extremely large per‑task datasets may need sparse GP approximations.
  • Limited empirical scope – Experiments focus on up to ~30 meta‑tasks; scaling to hundreds or thousands remains to be demonstrated.
  • Future directions suggested by the authors include: extending SMOG to non‑Gaussian likelihoods (e.g., classification), exploring deep kernel learning for richer representations, and applying the framework to reinforcement‑learning policy search where objectives like reward and safety conflict.

Authors

  • Leonard Papenmeier
  • Petru Tighineanu

Paper Information

  • arXiv ID: 2601.22131v1
  • Categories: cs.LG
  • Published: January 29, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »