[Paper] Regret Minimization with Adaptive Opponents in Repeated Games

Published: 6 days ago (June 4, 2026 at 01:59 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.06486v1

Overview

In this paper, we study regret minimization in repeated games with adaptive opponents who can respond based on histories of play. The standard metric of external regret in online learning is known to fail to capture such adaptivity. To account for players’ counterfactual reasoning, we introduce Repeated Policy Regret (RP‑Regret), a game‑theoretic metric that measures the difference between the realized and the best‑in‑hindsight accumulated utility when all players can respond to the history of play.

Compared to existing regret notions in this setting, ours is native to repeated game playing, enabling stronger comparators and opponents with fewer constraints, while maintaining the possibility of finding better equilibria when all players minimize it.

We first identify necessary conditions for obtaining RP‑Regret sublinear in time, on the variation of the player’s comparator strategies in the regret definition and on the memories of both the comparator and opponents’ strategies. We then study additional conditions and provable algorithms to minimize RP‑Regret, which is by definition non‑convex in the strategy space.

To address this challenge, we propose three algorithms:

Optimization‑oracle based method, as assumed in some prior work in online non‑convex learning.
Convex linearized surrogate minimization, which minimizes a convex and linearized surrogate of RP‑Regret at each iteration.
Direct minimization when opponents change strategies slowly.

Furthermore, when all players can run algorithms to minimize the RP‑Regret (or its linearized variant), certain subgame‑perfect equilibria of the repeated game can be learned. Experiments show that minimizing our regret notions can lead to more cooperative solutions with higher utility in games such as Stag‑Hunt.

Key Contributions

Introduces Repeated Policy Regret (RP‑Regret) as a new metric for adaptive opponents.
Identifies necessary conditions for sublinear RP‑Regret in terms of comparator variation and memory.
Proposes three algorithmic approaches to minimize the inherently non‑convex RP‑Regret.
Demonstrates that joint minimization of RP‑Regret leads to learnable subgame‑perfect equilibria.
Provides empirical evidence of improved cooperation and utility in classic games.

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of machine learning, artificial intelligence, and game theory (cs.LG, cs.AI, cs.GT).

Authors

Mingyang Liu
Asuman Ozdaglar
Tiancheng Yu
Kaiqing Zhang

Paper Information

arXiv ID: 2606.06486v1
Categories: cs.LG, cs.AI, cs.GT
Published: June 4, 2026
PDF: Download PDF

[Paper] Regret Minimization with Adaptive Opponents in Repeated Games

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] How reliable are LLMs when it comes to playing dice?

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

[Paper] Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

[Paper] Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization