[Paper] Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning

Published: (December 19, 2025 at 05:56 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.17444v1

Overview

A new study shows how multi‑agent reinforcement learning (MARL) can be turned into a virtual test‑bed for long‑term electricity market design. By letting profit‑seeking generators learn investment and bidding strategies in a simulated wholesale market, the authors demonstrate a way to evaluate how different auction rules, support schemes, and decarbonisation targets will shape the future generation mix and price stability.

Key Contributions

  • First MARL framework for long‑term electricity markets that captures investment, dispatch, and policy feedback loops.
  • Independent Proximal Policy Optimization (IPPO) adapted to a competitive, decentralized setting, with an exhaustive hyper‑parameter search to ensure realistic market outcomes.
  • Stylized Italian system case study exploring a spectrum of competition levels, market designs (e.g., capacity auctions, feed‑in tariffs), and policy scenarios (carbon price trajectories, renewable subsidies).
  • Quantitative evidence that market design choices critically affect both decarbonisation speed and price volatility.
  • Open‑source implementation (released with the paper) that can be re‑used for other regions or policy experiments.

Methodology

  1. Agents & Environment

    • Each generation company (GenCo) is an autonomous RL agent that maximizes its discounted profit over a multi‑year horizon.
    • The environment simulates the wholesale market clearing (hourly dispatch), demand growth, fuel price paths, and exogenous policy levers (carbon tax, renewable subsidies).
  2. Learning Algorithm

    • Agents use Independent Proximal Policy Optimization (IPPO): each agent treats the others as part of the environment and updates its own policy via PPO’s clipped objective.
    • To counter non‑stationarity (agents learning simultaneously), the authors performed a large‑scale hyper‑parameter sweep (learning rates, clipping epsilon, network depth) and selected configurations that reproduced known competitive equilibria (e.g., price‑taking behavior under perfect competition).
  3. Market Design Experiments

    • Competition regimes: from perfect competition to oligopoly (few large GenCos).
    • Policy instruments: carbon price paths, capacity market auctions, feed‑in tariffs, and hybrid schemes.
    • Evaluation metrics: CO₂ emissions trajectory, generation mix evolution, average wholesale price, and price volatility (standard deviation of hourly prices).
  4. Simulation Horizon

    • 30‑year horizon, with yearly investment decisions and hourly dispatch for each simulated year, enabling the capture of long‑term lock‑in effects.

Results & Findings

ScenarioCO₂ Reduction (30 yr)Share of RenewablesAvg. Price (€ /MWh)Price Volatility
Baseline (no carbon price)15 %35 %55High
Carbon tax €80/tCO₂45 %65 %70Moderate
Capacity auction + modest carbon tax38 %60 %62Low
Feed‑in tariff (fixed)30 %55 %58High (price spikes)
  • Market design matters: Capacity auctions reduced price spikes compared to pure feed‑in tariffs, even when overall decarbonisation was similar.
  • Competition level influences outcomes: Oligopolistic markets tended to under‑invest in renewables unless strong policy signals (high carbon price) were present.
  • Policy interaction: Combining a moderate carbon price with a well‑designed capacity market yielded the best trade‑off between emissions, renewable uptake, and price stability.

Practical Implications

  • Policymakers: The framework offers a sandbox to test “what‑if” combinations of carbon pricing, capacity mechanisms, and subsidy designs before committing to costly real‑world roll‑outs.
  • System Operators & Market Designers: Insights on how auction rules (e.g., bid caps, delivery obligations) can dampen price volatility while still encouraging low‑carbon investments.
  • Energy Companies: A tool to stress‑test long‑term investment strategies against a range of regulatory futures, helping to de‑risk capital allocation.
  • Software Vendors & Platform Builders: The open‑source MARL environment can be integrated into existing market simulation suites, extending them with adaptive, learning‑based agents rather than static cost‑minimisation models.

Limitations & Future Work

  • Stylized system: The Italian case abstracts away transmission constraints, ancillary services, and detailed fuel market dynamics, which could affect investment incentives.
  • Independent learning assumptions: While IPPO performed well after hyper‑parameter tuning, truly cooperative or adversarial dynamics (e.g., collusion) are not fully captured.
  • Computational cost: Multi‑year, hourly simulations with many agents demand significant compute resources, limiting rapid iteration.
  • Future directions: Incorporate network constraints, model demand‑side flexibility, explore multi‑agent algorithms that explicitly handle non‑stationarity (e.g., centralized critic approaches), and validate the framework against historical market reforms.

Authors

  • Javier Gonzalez‑Ruiz
  • Carlos Rodriguez‑Pardo
  • Iacopo Savelli
  • Alice Di Bella
  • Massimo Tavoni

Paper Information

  • arXiv ID: 2512.17444v1
  • Categories: cs.LG, cs.AI, cs.NE, econ.GN
  • Published: December 19, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »

[Paper] When Reasoning Meets Its Laws

Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabil...