[Paper] Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning

Published: 1 month ago (December 18, 2025 at 12:54 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.16813v1

Overview

This paper tackles a pressing problem for autonomous robot swarms: reactive jammers that sense the network’s activity and selectively jam communications, breaking formation coordination and mission goals. By framing the anti‑jamming problem as a multi‑agent reinforcement learning (MARL) task, the authors show how a swarm can learn to pick frequencies and transmit powers in a coordinated way that stays one step ahead of an adaptive jammer.

Key Contributions

MARL‑based anti‑jamming framework: Introduces a decentralized yet coordinated learning solution using the QMIX algorithm, which learns a joint action‑value function that can be factorized for individual agents.
Realistic jammer model: Models a reactive jammer with Markovian threshold dynamics that senses aggregate power and decides when/where to jam, reflecting practical adversarial behavior.
Comprehensive benchmarking: Evaluates QMIX against a genie‑aided optimal policy, a local Upper Confidence Bound (UCB) bandit approach, and a stateless reactive policy, covering both no‑reuse and channel‑reuse fading scenarios.
Performance close to optimal: Demonstrates that QMIX converges quickly to policies that achieve throughput within a few percent of the genie‑aided bound while drastically reducing successful jamming events.
Scalable to larger swarms: Shows that the factorized value function enables decentralized execution, making the approach viable for swarms with many agents and limited on‑board compute.

Methodology

System model
- A swarm consists of multiple transmitter‑receiver pairs sharing a set of frequency channels.
- Each agent decides (channel, power) jointly at every time step.
- The reactive jammer monitors the total received power; if it exceeds a hidden threshold, it jams the most interfered channel for the next slot (Markovian dynamics).
Learning formulation
- The problem is cast as a cooperative Dec‑POMDP: agents share a common reward (e.g., successful packet delivery, low interference).
- QMIX learns a centralized action‑value function Q_tot that is monotonic in each agent’s local Q‑value, allowing the global optimum to be recovered by each agent acting greedily on its own Q‑function.
Training pipeline
- Simulated episodes generate state‑action‑reward tuples.
- Experience replay buffers store transitions for off‑policy updates.
- The network architecture uses a recurrent encoder for each agent (to handle partial observability) and a mixing network that enforces the monotonicity constraint.
Baselines
- Genie‑aided optimal: exhaustive search over all joint actions (only feasible for small networks).
- Local UCB: each agent treats each (channel, power) pair as a bandit arm and selects via Upper Confidence Bound.
- Stateless reactive: a heuristic that switches channels when jamming is detected, without learning.

Results & Findings

Metric	QMIX	Genie‑aided optimal	Local UCB	Stateless reactive
Throughput (packets/slot)	0.92 × optimal	1.00	0.68 × optimal	0.55 × optimal
Jamming success rate	8 %	0 %	31 %	44 %
Convergence time	≈ 2 k episodes	N/A (offline)	> 10 k episodes	N/A (rule‑based)

Rapid convergence: QMIX reaches > 90 % of optimal throughput within a few thousand training episodes, far faster than the UCB baseline.
Robustness to fading & channel reuse: Even when multiple agents share the same channel under realistic fading, QMIX maintains a clear advantage, adapting power levels to mitigate interference.
Scalability: Experiments with up to 12 agents show only modest degradation, confirming the factorized value function’s ability to handle larger swarms without exponential blow‑up.

Practical Implications

Secure swarm deployments: Developers building UAV, ground‑robot, or IoT swarms can embed a lightweight QMIX‑derived policy to autonomously avoid jamming without needing a central controller.
Dynamic spectrum access: The joint channel‑power selection can be repurposed for civilian spectrum‑sharing scenarios (e.g., industrial IoT in congested ISM bands) where interference is unpredictable.
Edge‑friendly inference: Once trained, each agent only runs a small feed‑forward network to evaluate its local Q‑values, fitting within typical embedded compute budgets (e.g., ARM Cortex‑M or low‑power GPUs).
Rapid adaptation: Because the policy is learned offline but executed online, swarms can be pre‑trained against a family of jammer behaviors and then fine‑tuned on‑site with minimal data, enabling continuous resilience.

Limitations & Future Work

Training overhead: The current approach relies on extensive simulated episodes; transferring to real hardware may require domain‑randomization or sim‑to‑real techniques.
Assumed shared reward: The cooperative reward structure presumes all agents have aligned objectives; future work could explore mixed‑cooperation/competition settings (e.g., heterogeneous missions).
Static jammer model: The jammer follows a Markovian threshold rule; more sophisticated adversaries (e.g., learning jammers) remain an open challenge.
Scalability beyond dozens of agents: While factorization helps, extremely large swarms may need hierarchical MARL or communication‑efficient approximations.

Overall, the paper demonstrates that modern MARL—specifically QMIX—can give autonomous swarms a practical, data‑driven shield against adaptive jamming, opening the door to more robust field deployments.

Authors

Bahman Abolhassani
Tugba Erpek
Kemal Davaslioglu
Yalin E. Sagduyu
Sastry Kompella

Paper Information

arXiv ID: 2512.16813v1
Categories: cs.NI, cs.AI, cs.DC, cs.LG, eess.SP
Published: December 18, 2025
PDF: Download PDF

[Paper] Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

[Paper] Adversarial Robustness of Vision in Open Foundation Models

[Paper] When Reasoning Meets Its Laws

[Paper] Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy