[Paper] Deep Reinforcement Learning-Assisted Automated Operator Portfolio for Constrained Multi-objective Optimization

Published: 3 days ago (March 17, 2026 at 07:39 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2603.16401v1

Overview

The paper introduces CMOEA‑AOP, a new evolutionary algorithm that uses deep reinforcement learning (DRL) to dynamically allocate a portfolio of variation operators when solving constrained multi‑objective optimization problems (CMOPs). By learning how different operators affect convergence and diversity, the method adapts its search strategy on‑the‑fly, delivering more robust performance across a wide range of real‑world problems.

Key Contributions

Operator Portfolio via DRL – Formulates operator selection as a reinforcement‑learning problem, mapping population‑state features to an optimal set of operators rather than a single choice.
State‑Reward Design – Encodes both optimization‑related (e.g., convergence, diversity) and constraint‑related (feasibility ratio) features as the RL state; rewards capture overall improvement in convergence and diversity.
Deep Neural Network Policy – Trains a deep Q‑network to estimate expected cumulative rewards, enabling fast inference of the best operator mix at each generation.
Plug‑and‑Play Integration – The DRL‑based portfolio can be embedded into any existing constrained multi‑objective evolutionary algorithm (CMOEA), turning it into CMOEA‑AOP without redesigning the core EA.
Extensive Benchmark Validation – Experiments on 33 CMOP benchmark suites show statistically significant gains in hypervolume and feasibility over state‑of‑the‑art CMOEAs, with lower variance across problem instances.

Methodology

Population Feature Extraction – At every generation the algorithm computes a vector of descriptors:
- Convergence (average distance to the Pareto front estimate)
- Diversity (spread of solutions)
- Constraint Violation statistics (percentage of feasible individuals, mean violation).
Action Space (Operator Portfolios) – Instead of picking a single mutation/crossover operator, the RL agent selects a portfolio: a probability distribution over a predefined set of operators (e.g., SBX crossover, polynomial mutation, DE‑style operators).
Reward Signal – After applying the chosen portfolio for one generation, the algorithm measures the improvement in hypervolume and feasibility. The reward is a weighted sum of these two components, encouraging both better objective values and constraint satisfaction.
Deep Q‑Learning – A deep neural network approximates the Q‑function (Q(s, a)), where (s) is the state vector and (a) is a portfolio action. The network is trained online using experience replay: state‑action‑reward‑next‑state tuples are stored and sampled to update the network parameters via stochastic gradient descent.
Policy Execution – At each generation the current state is fed to the trained network; the action with the highest Q‑value is selected, yielding the operator mix for that iteration.
Embedding into CMOEAs – The DRL‑driven portfolio replaces the static operator selection module of a baseline CMOEA (e.g., NSGA‑II‑C, MOEA/D‑C), resulting in the CMOEA‑AOP framework.

Results & Findings

Metric	Baseline CMOEA	CMOEA‑AOP (DRL‑AOP)
Hypervolume (average)	0.62	0.78
Feasibility Ratio (≥ 0.9)	0.71	0.89
Standard Deviation (stability)	0.12	0.04

Performance Boost – Across 33 benchmark CMOPs, CMOEA‑AOP consistently outperformed the original algorithms, with average hypervolume improvements of 20‑30 %.
Robustness – The variance of results dropped dramatically, indicating that the learned portfolio mitigates the risk of getting stuck in local optima.
Scalability – Training overhead is modest: the DRL policy converges within a few hundred generations, after which inference costs are negligible compared to the evaluation of objective functions.

Practical Implications

Plug‑in Optimizer for Engineers – Developers building design‑space exploration tools (e.g., automotive chassis design, microchip layout, resource allocation) can drop‑in the DRL‑based portfolio to automatically adapt mutation/crossover strategies without hand‑tuning.
Reduced Tuning Burden – Traditional CMOEAs require extensive parameter sweeps for each new problem. The learned policy abstracts this effort, freeing engineers to focus on model fidelity rather than algorithmic knobs.
Better Use of Compute Budgets – By allocating multiple operators per generation, the algorithm extracts more information per function evaluation, which is valuable when each evaluation involves expensive simulations or CFD runs.
Potential for AutoML Pipelines – The same portfolio concept can be extended to hyperparameter optimization or neural architecture search where constraints (e.g., latency, memory) are critical.

Limitations & Future Work

Domain‑Specific Operator Set – The current approach assumes a predefined pool of operators; discovering or generating new operators on‑the‑fly remains an open challenge.
State Representation Simplicity – The handcrafted features may not capture all nuances of highly non‑linear search spaces; richer representations (e.g., graph embeddings of the population) could improve policy quality.
Training Cost for Very Large Problems – While inference is cheap, the initial RL training phase can be non‑trivial for problems with millions of decision variables or extremely costly evaluations.
Transferability – The policy learned on benchmark suites may not generalize directly to drastically different industrial problems; future work could explore meta‑learning or few‑shot adaptation techniques.

Bottom line: By marrying deep reinforcement learning with evolutionary search, the authors deliver a versatile, easy‑to‑integrate optimizer that automatically tailors its operator mix to the problem at hand—an advance that could streamline constrained multi‑objective optimization in many engineering and AI‑driven workflows.*

Authors

Shuai Shao
Ye Tian
Shangshang Yang
Xingyi Zhang

Paper Information

arXiv ID: 2603.16401v1
Categories: cs.NE
Published: March 17, 2026
PDF: Download PDF

[Paper] Deep Reinforcement Learning-Assisted Automated Operator Portfolio for Constrained Multi-objective Optimization

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

[Paper] Matryoshka Gaussian Splatting

[Paper] Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

[Paper] MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction