[Paper] Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks

Published: 5 days ago (May 6, 2026 at 03:08 AM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.04565v1

Overview

The paper proposes a delay‑aware collaboration framework that lets low‑Earth‑orbit (LEO) satellites share the workload of AI inference. Small, resource‑constrained “remote‑sensing” satellites run lightweight models locally, while larger “computing” satellites host powerful models and handle the heavy lifting. By jointly deciding what to offload and how to route the data across inter‑satellite links, the scheme cuts end‑to‑end service latency by up to 31 % compared with existing baselines.

Key Contributions

Large‑small model collaboration architecture tailored for heterogeneous LEO constellations (tiny sensing nodes + powerful compute nodes).
Joint offloading‑routing optimization formulated as a delay‑minimization problem under bandwidth and compute constraints.
Decentralized POMDP transformation that enables each satellite to make decisions based only on locally observable information.
Multi‑agent reinforcement learning (MARL) solution with:
- Offline training of routing policies for all agents.
- Online bisection search that dynamically tunes offloading ratios in real time.
Extensive simulation showing up to 31.85 % latency reduction versus static offloading or naïve routing schemes.

Methodology

System model – The constellation is split into two roles:
- Remote‑sensing satellites (RS‑Sat) collect imagery and run a small neural net for quick preprocessing.
- Computing satellites (C‑Sat) host a large neural net for high‑accuracy inference.
  Data can be sent from an RS‑Sat to any reachable C‑Sat via multi‑hop inter‑satellite links (ISLs).
Problem formulation – The authors define a total service delay consisting of:
- Local processing time on the RS‑Sat.
- Transmission delay over each ISL hop.
- Execution time on the selected C‑Sat.
  The goal is to minimize this delay while respecting each satellite’s CPU and bandwidth limits.
Decentralized POMDP – Because each satellite only knows its own queue length, link quality, and CPU load, the global optimization is cast as a partially observable Markov decision process.
MARL algorithm –
- Offline phase: A multi‑agent deep Q‑network (MADQN) is trained on a simulated constellation to learn routing policies that map local observations to next‑hop decisions.
- Online phase: When a new task arrives, the system runs a fast bisection search on the offloading ratio (how much of the input is processed locally vs. sent away) while using the pre‑trained routing policy to route the offloaded portion.
Evaluation – The authors compare against:
- Pure local processing (no offloading).
- Full offloading to the nearest C‑Sat.
- Heuristic routing (shortest‑path) with static offloading.

Results & Findings

Metric	Baseline	Proposed Scheme
Average end‑to‑end delay	1.84 s (full local) / 1.27 s (full offload)	1.25 s (≈31 % improvement over full local)
95‑th percentile latency	2.31 s	1.58 s
Network bandwidth utilization	Peaks at 85 %	Balanced at ~62 % (thanks to smarter routing)
CPU load distribution	Highly skewed (C‑Sat overloaded)	Evenly spread across C‑Sats

Key takeaways:

Dynamic offloading (partial processing on RS‑Sat) reduces the amount of data that must traverse the ISL network, cutting transmission delay.
Learned routing avoids congested hops, leading to more stable latency even under varying link conditions.
The offline‑online split keeps runtime overhead low—only a few milliseconds for the bisection search—making the approach viable for real‑time satellite services.

Practical Implications

Edge‑AI for Earth observation: Operators can run quick anomaly detection on board a small satellite and only forward ambiguous patches to a powerful node, saving bandwidth and delivering faster alerts.
Constellation management: Network operators can embed the MARL policy into satellite firmware, enabling autonomous load‑balancing without ground‑station intervention.
Developer APIs: The framework suggests a clean abstraction—process locally, offload remainder, specify routing policy—that could be exposed as a cloud‑like SDK for satellite‑borne applications.
Cost efficiency: By squeezing more inference out of existing ISL capacity, providers may defer costly upgrades to higher‑throughput laser links.

Limitations & Future Work

Simulation‑only validation: Real‑world LEO dynamics (e.g., rapid topology changes, radiation‑induced errors) were approximated; field trials are needed.
Scalability of training: The offline MARL training assumes a fixed constellation size; extending to mega‑constellations (thousands of nodes) may require hierarchical learning or transfer learning techniques.
Security & privacy: Offloading raw sensor data raises confidentiality concerns; future work could integrate encryption‑aware routing or federated learning.
Energy considerations: The current model focuses on delay; incorporating satellite power budgets could further refine offloading decisions.

Bottom line: This research offers a concrete, AI‑driven recipe for making LEO satellite networks more responsive and resource‑aware—an exciting step toward truly intelligent space‑based edge computing.*

Authors

Mingyu Guo
Wen Wu
Ying Wang
Songge Zhang
Liang Li

Paper Information

arXiv ID: 2605.04565v1
Categories: cs.DC
Published: May 6, 2026
PDF: Download PDF

[Paper] Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Stencil Computations on Cerebras Wafer-Scale Engine

[Paper] Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

[Paper] A Scalable Recipe on SuperMUC-NG Phase 2: Efficient Large-Scale Training of Language Models

[Paper] Stencil Computations on Tenstorrent Wormhole