[Paper] Resilient Packet Forwarding: A Reinforcement Learning Approach to Routing in Gaussian Interconnected Networks with Clustered Faults

Published: (December 23, 2025 at 09:31 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.20394v1

Overview

The paper introduces a fault‑aware routing algorithm that uses reinforcement learning (RL) to keep packets flowing in Gaussian Interconnected Networks (GINs)—a class of topologies built from Gaussian integers that promise low diameter and high symmetry. By training a Proximal Policy Optimization (PPO) agent to avoid regions where nodes fail in clustered, Gaussian‑distributed patterns (e.g., thermal hotspots), the authors show a dramatic boost in packet delivery reliability compared with a conventional greedy adaptive router.

Key Contributions

  • RL‑driven routing for GINs: First work that applies a PPO‑based agent to the specific arithmetic‑based topology of Gaussian networks.
  • Fault‑proximity reward design: A custom reward function that penalizes paths that approach faulty nodes, encouraging the agent to learn “safe corridors.”
  • Comprehensive evaluation: Empirical comparison against a greedy adaptive shortest‑path algorithm across a range of fault densities (up to 40 %) and traffic loads (20 %–80 %).
  • High resilience: Achieves a packet delivery ratio (PDR) of 0.95 at 40 % fault density, versus 0.66 for the baseline.
  • Congestion awareness: Demonstrates superior performance under low‑load conditions (PDR 0.57 vs. 0.43), indicating the agent can balance fault avoidance with load distribution.

Methodology

  1. Network model: The authors model a 2‑D Gaussian Interconnected Network where each node’s address is a Gaussian integer (a + bi). Links exist between nodes whose addresses differ by a unit Gaussian integer, yielding a regular, highly symmetric mesh.
  2. Fault injection: Faults are introduced in clusters that follow a Gaussian spatial distribution, mimicking realistic hotspot failures. Fault density is varied from 10 % to 40 % of the nodes/links.
  3. RL formulation:
    • State: Current node, destination node, and a binary map of known faulty neighbors (learned via periodic heartbeat messages).
    • Action: Choose the next hop among the up to eight neighboring nodes.
    • Reward: +1 for successful delivery, –0.5 for stepping into a faulty neighbor, –0.1 for each hop (to encourage short paths), and a large penalty (–5) for packet loss.
  4. Training: A PPO agent is trained offline on simulated traffic patterns. The policy network is a shallow feed‑forward model (2 hidden layers, 128 units each) that can be embedded in NoC routers with modest SRAM.
  5. Baseline: A deterministic greedy adaptive routing algorithm that always selects the neighbor that reduces Manhattan distance to the destination, while avoiding known faulty links when possible.

Results & Findings

MetricRL‑PPO RouterGreedy Adaptive
PDR @ 40 % fault density0.950.66
PDR @ 20 % traffic load0.570.43
Average hop count (low load)1.8 % higher than optimal (due to detours)2.5 % higher
Convergence time (training)~2 M episodes (≈ 30 min on a single GPU)N/A

Key takeaways

  • The RL agent learns to circumvent fault clusters without sacrificing too many extra hops, preserving latency.
  • Under high fault density, the policy remains stable, whereas the greedy method quickly collapses into dead‑ends.
  • Even with light traffic, the RL router distributes packets more evenly, reducing contention hotspots that typically arise in deterministic schemes.

Practical Implications

  • Network‑on‑Chip (NoC) designers can embed a lightweight RL policy in router micro‑code to obtain self‑healing routing without redesigning the physical topology.
  • Wireless sensor networks (WSNs) deployed in harsh environments (industrial plants, disaster zones) can benefit from on‑node learning agents that adapt to sensor failures in real time.
  • The fault‑proximity reward concept is portable: any mesh‑like topology (e.g., torus, hexagonal) can adopt a similar RL formulation to improve resilience.
  • Because the policy network is small, area and power overhead are minimal—critical for silicon‑level implementations where every mm² counts.
  • The approach opens the door for online continual learning, where routers periodically retrain with fresh fault data, enabling truly autonomous fault recovery.

Limitations & Future Work

  • Training is offline: The current study assumes a pre‑trained model; on‑chip online training would require additional compute resources and careful stability guarantees.
  • Scalability to massive NoCs: Experiments were limited to modest‑size networks (≤ 64 × 64 nodes). Scaling the state representation and ensuring fast inference on larger fabrics remains an open challenge.
  • Fault detection latency: The method presumes timely knowledge of faulty neighbors; delayed detection could degrade performance.
  • Security considerations: An adversary could manipulate fault reports to mislead the RL policy—a topic the authors suggest exploring.
  • Future work includes hierarchical RL for multi‑level NoCs, transfer learning across different topologies, and hardware‑accelerated inference to further shrink latency and power footprints.

Authors

  • Mohammad Walid Charrwi
  • Zaid Hussain

Paper Information

  • arXiv ID: 2512.20394v1
  • Categories: cs.DC
  • Published: December 23, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »