[Paper] Offline Multi-Task Multi-Objective Data-Driven Evolutionary Algorithm with Language Surrogate Model and Implicit Q-Learning
Source: arXiv - 2512.15149v1
Overview
The paper introduces Q‑MetaSur, a plug‑and‑play surrogate‑modeling framework that turns multi‑task, multi‑objective optimization (MTMOO) into a language‑model problem. By leveraging a large language model (LLM) and implicit Q‑learning, the authors achieve more accurate objective predictions and faster convergence for expensive, offline optimization scenarios—an advance that could reshape how engineers tackle complex design spaces without costly simulations.
Key Contributions
- Unified surrogate via language modeling – Reformulates MTMOO as a sequence‑to‑sequence (seq2seq) task, enabling a single LLM to predict multiple objectives across many tasks.
- Two‑stage offline training – Combines supervised fine‑tuning on a static dataset with reinforcement‑learning (RL) fine‑tuning (implicit Q‑learning) to boost generalization on unseen decision variables.
- Plug‑and‑play integration – Q‑MetaSur can be dropped into existing evolutionary algorithms (EAs) without redesigning the optimizer.
- Empirical superiority – Demonstrates higher surrogate accuracy and better Pareto front quality than classic Kriging, Random Forest, and neural‑network surrogates on the CEC‑2019 MTMOO benchmark.
- Scalable to many sub‑objectives – Handles high‑dimensional objective vectors that traditionally strain surrogate models.
Methodology
- Tokenization of MTMOO instances – Each optimization problem (tasks, decision variables, and known objective values) is serialized into a textual sequence, similar to how code or natural language is tokenized for LLMs.
- Seq2seq surrogate model – A pre‑trained LLM (e.g., GPT‑style transformer) acts as an encoder‑decoder:
- Encoder consumes the tokenized description of a task and a candidate decision vector.
- Decoder autoregressively generates the predicted objective values token by token.
- Two‑stage offline training
- Supervised tuning: The model learns to map input sequences to ground‑truth objective tokens using the offline dataset collected from expensive simulations.
- RL fine‑tuning (implicit Q‑learning): Treats the surrogate as a policy that receives a reward based on prediction error; the Q‑function is learned implicitly to encourage predictions that improve downstream EA performance.
- Integration with EA – The trained surrogate replaces the expensive objective evaluator inside any standard EA (e.g., NSGA‑II, MOEA/D). The EA queries the surrogate for fitness, while occasional true evaluations keep the search grounded.
Results & Findings
| Metric | Classic Surrogates (Kriging, RF) | Neural‑Net Baseline | Q‑MetaSur |
|---|---|---|---|
| Mean Absolute Error (MAE) on objectives | 0.042 | 0.037 | 0.021 |
| Hypervolume improvement (EA + surrogate) | +12 % | +15 % | +28 % |
| Convergence speed (generations to 90 % HV) | 150 | 130 | 85 |
- Accuracy boost: Q‑MetaSur cuts prediction error roughly in half compared with the best traditional surrogate.
- Pareto quality: Evolutionary runs guided by Q‑MetaSur achieve significantly larger hypervolumes, indicating a more diverse and optimal set of solutions.
- Faster convergence: Because the surrogate is more reliable, the EA needs fewer generations to approach the true Pareto front.
The authors also performed ablation studies showing that both the seq2seq formulation and the RL fine‑tuning contribute meaningfully to the gains.
Practical Implications
- Reduced simulation budget – Companies that rely on costly CFD, FEM, or hardware‑in‑the‑loop tests can replace many evaluations with a language‑model surrogate, cutting time and cloud‑compute costs.
- Rapid prototyping for multi‑disciplinary design – Automotive, aerospace, and semiconductor teams often juggle dozens of objectives (weight, cost, performance, reliability). Q‑MetaSur’s unified model handles them without building separate surrogates per objective.
- Plug‑in for existing pipelines – Since the surrogate follows the standard EA API, teams can adopt it with minimal code changes, preserving their CI/CD and automated optimization workflows.
- Potential for “code‑as‑surrogate” – The seq2seq approach opens the door to training on raw source‑code or configuration files, enabling surrogate predictions directly from design specifications.
Limitations & Future Work
- Dependence on large offline datasets – Training the LLM surrogate still requires a substantial set of high‑fidelity evaluations; sparse data regimes may degrade performance.
- Compute overhead of the surrogate – Inference with a transformer is heavier than a Kriging model, which could be a bottleneck for real‑time or embedded applications.
- Generalization to out‑of‑distribution tasks – The paper notes that when the test tasks differ dramatically from the training distribution, prediction quality drops, suggesting a need for continual‑learning mechanisms.
- Future directions proposed include:
- Integrating active learning to query true evaluations selectively.
- Exploring lightweight transformer variants for edge deployment.
- Extending the framework to dynamic (online) optimization where the objective landscape evolves over time.
Authors
- Xian‑Rong Zhang
- Yue‑Jiao Gong
- Zeyuan Ma
- Jun Zhang
Paper Information
- arXiv ID: 2512.15149v1
- Categories: cs.NE, cs.AI
- Published: December 17, 2025
- PDF: Download PDF