[Paper] Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning
Source: arXiv - 2512.17444v1
Overview
A new study shows how multi‑agent reinforcement learning (MARL) can be turned into a virtual test‑bed for long‑term electricity market design. By letting profit‑seeking generators learn investment and bidding strategies in a simulated wholesale market, the authors demonstrate a way to evaluate how different auction rules, support schemes, and decarbonisation targets will shape the future generation mix and price stability.
Key Contributions
- First MARL framework for long‑term electricity markets that captures investment, dispatch, and policy feedback loops.
- Independent Proximal Policy Optimization (IPPO) adapted to a competitive, decentralized setting, with an exhaustive hyper‑parameter search to ensure realistic market outcomes.
- Stylized Italian system case study exploring a spectrum of competition levels, market designs (e.g., capacity auctions, feed‑in tariffs), and policy scenarios (carbon price trajectories, renewable subsidies).
- Quantitative evidence that market design choices critically affect both decarbonisation speed and price volatility.
- Open‑source implementation (released with the paper) that can be re‑used for other regions or policy experiments.
Methodology
-
Agents & Environment
- Each generation company (GenCo) is an autonomous RL agent that maximizes its discounted profit over a multi‑year horizon.
- The environment simulates the wholesale market clearing (hourly dispatch), demand growth, fuel price paths, and exogenous policy levers (carbon tax, renewable subsidies).
-
Learning Algorithm
- Agents use Independent Proximal Policy Optimization (IPPO): each agent treats the others as part of the environment and updates its own policy via PPO’s clipped objective.
- To counter non‑stationarity (agents learning simultaneously), the authors performed a large‑scale hyper‑parameter sweep (learning rates, clipping epsilon, network depth) and selected configurations that reproduced known competitive equilibria (e.g., price‑taking behavior under perfect competition).
-
Market Design Experiments
- Competition regimes: from perfect competition to oligopoly (few large GenCos).
- Policy instruments: carbon price paths, capacity market auctions, feed‑in tariffs, and hybrid schemes.
- Evaluation metrics: CO₂ emissions trajectory, generation mix evolution, average wholesale price, and price volatility (standard deviation of hourly prices).
-
Simulation Horizon
- 30‑year horizon, with yearly investment decisions and hourly dispatch for each simulated year, enabling the capture of long‑term lock‑in effects.
Results & Findings
| Scenario | CO₂ Reduction (30 yr) | Share of Renewables | Avg. Price (€ /MWh) | Price Volatility |
|---|---|---|---|---|
| Baseline (no carbon price) | 15 % | 35 % | 55 | High |
| Carbon tax €80/tCO₂ | 45 % | 65 % | 70 | Moderate |
| Capacity auction + modest carbon tax | 38 % | 60 % | 62 | Low |
| Feed‑in tariff (fixed) | 30 % | 55 % | 58 | High (price spikes) |
- Market design matters: Capacity auctions reduced price spikes compared to pure feed‑in tariffs, even when overall decarbonisation was similar.
- Competition level influences outcomes: Oligopolistic markets tended to under‑invest in renewables unless strong policy signals (high carbon price) were present.
- Policy interaction: Combining a moderate carbon price with a well‑designed capacity market yielded the best trade‑off between emissions, renewable uptake, and price stability.
Practical Implications
- Policymakers: The framework offers a sandbox to test “what‑if” combinations of carbon pricing, capacity mechanisms, and subsidy designs before committing to costly real‑world roll‑outs.
- System Operators & Market Designers: Insights on how auction rules (e.g., bid caps, delivery obligations) can dampen price volatility while still encouraging low‑carbon investments.
- Energy Companies: A tool to stress‑test long‑term investment strategies against a range of regulatory futures, helping to de‑risk capital allocation.
- Software Vendors & Platform Builders: The open‑source MARL environment can be integrated into existing market simulation suites, extending them with adaptive, learning‑based agents rather than static cost‑minimisation models.
Limitations & Future Work
- Stylized system: The Italian case abstracts away transmission constraints, ancillary services, and detailed fuel market dynamics, which could affect investment incentives.
- Independent learning assumptions: While IPPO performed well after hyper‑parameter tuning, truly cooperative or adversarial dynamics (e.g., collusion) are not fully captured.
- Computational cost: Multi‑year, hourly simulations with many agents demand significant compute resources, limiting rapid iteration.
- Future directions: Incorporate network constraints, model demand‑side flexibility, explore multi‑agent algorithms that explicitly handle non‑stationarity (e.g., centralized critic approaches), and validate the framework against historical market reforms.
Authors
- Javier Gonzalez‑Ruiz
- Carlos Rodriguez‑Pardo
- Iacopo Savelli
- Alice Di Bella
- Massimo Tavoni
Paper Information
- arXiv ID: 2512.17444v1
- Categories: cs.LG, cs.AI, cs.NE, econ.GN
- Published: December 19, 2025
- PDF: Download PDF