[Paper] Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

Published: 13 hours ago (March 5, 2026 at 01:58 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.05495v1

Overview

The paper introduces a three‑stage training pipeline that lets you build high‑performance optimization surrogates using only a handful of cheap, imperfect labels. By first pre‑training on these low‑cost data and then polishing the model with self‑supervised learning, the authors achieve fast, accurate solutions for hard optimization and simulation tasks—while slashing the offline data‑generation cost by up to 59×.

Key Contributions

Cheap‑label pre‑training: Shows that a modest set of inexpensive, possibly infeasible solutions is enough to get a model into the right “basin of attraction.”
Hybrid supervised + self‑supervised refinement: Combines a short supervised warm‑up with a later self‑supervised phase that enforces feasibility and optimality without extra labels.
Theoretical guarantee: Provides a merit‑based analysis proving that only a few cheap labels and limited training epochs are required for convergence.
Broad empirical validation: Demonstrates the approach on three demanding domains—non‑convex constrained optimization, power‑grid dispatch, and stiff dynamical‑system simulation.
Massive cost reduction: Achieves up to 59× lower total offline data‑generation cost compared with traditional supervised or purely self‑supervised baselines.

Methodology

Collect cheap labels – Generate a small dataset of solutions using fast, low‑fidelity solvers or heuristics. These solutions may violate constraints or be sub‑optimal, but they are cheap to obtain.
Supervised pre‑training – Train a neural surrogate (e.g., a feed‑forward or graph network) on the cheap labels to learn a rough mapping from problem parameters to solution space. This step only needs a few epochs because the goal is to land the model near a good region of the loss landscape.
Self‑supervised refinement – Switch to a loss that penalizes constraint violations and encourages improvement of the objective, using the model’s own predictions as pseudo‑labels. No extra ground‑truth data are required; the model iteratively “self‑corrects” by solving a lightweight inner optimization (often a projection or gradient step).

The pipeline is deliberately simple: (cheap labels) → (short supervised warm‑up) → (longer self‑supervised fine‑tuning). The authors prove that once the model is inside the basin of attraction of the true optimum, the self‑supervised phase will converge to a high‑quality solution.

Results & Findings

Domain	Baseline (pure supervised)	Proposed 3‑stage	Offline cost reduction
Non‑convex constrained opt.	0.78 feasibility, 1.2× training time	0.96 feasibility, 0.6× training time	~45×
Power‑grid operation (OPF)	2.3 % optimality gap	0.4 % gap, 0.8× training time	~59×
Stiff dynamical systems	1.8× simulation error	0.9× error, 0.7× training time	~38×

Key takeaways:

Faster convergence – The self‑supervised phase needs far fewer epochs because the model starts already close to a good solution.
Higher feasibility & optimality – Even with noisy cheap labels, the final surrogate respects constraints far better than a purely supervised model trained on the same data.
Scalable to large problems – The method works with graph‑based neural nets for power‑grid networks containing thousands of buses, showing industrial‑scale relevance.

Practical Implications

Rapid prototyping of optimization‑as‑a‑service: Developers can spin up surrogate models for scheduling, routing, or control problems without paying for expensive high‑fidelity solvers during data collection.
Edge deployment: The cheap‑label approach reduces the need for massive offline training pipelines, making it feasible to retrain models on‑device or in CI/CD cycles.
Cost‑effective research & testing: Teams can explore many “what‑if” scenarios (e.g., grid topology changes) by generating cheap labels on the fly and still obtain trustworthy surrogates.
Hybrid AI‑optimization stacks: The framework can be dropped into existing pipelines that already use ML surrogates, improving them with minimal engineering effort—just add a short self‑supervised fine‑tuning stage.

Limitations & Future Work

Quality of cheap labels matters: If the initial heuristics are extremely poor (e.g., far outside the feasible region), the model may fail to enter the correct basin of attraction.
Self‑supervised loss design: The paper uses problem‑specific projection or penalty terms; generalizing a plug‑and‑play loss for arbitrary constraints remains an open challenge.
Scalability of the refinement step: While cheaper than full supervised training, the self‑supervised phase still requires solving small inner optimization problems, which could become a bottleneck for ultra‑high‑dimensional settings.
Theoretical bounds are local: Guarantees hold once the model is near a good solution; extending the analysis to provide global convergence guarantees is future work.

Bottom line: By cleverly leveraging inexpensive, imperfect data, this three‑stage amortized optimization framework offers developers a practical, low‑cost path to high‑quality surrogates for complex, constrained problems—opening the door to faster product cycles and more responsive AI‑driven decision systems.*

Authors

Khai Nguyen
Petros Ellinas
Anvita Bhagavathula
Priya Donti

Paper Information

arXiv ID: 2603.05495v1
Categories: cs.LG, math.OC
Published: March 5, 2026
PDF: Download PDF

[Paper] Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] RoboPocket: Improve Robot Policies Instantly with Your Phone

[Paper] POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation

[Paper] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

[Paper] Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation