[Paper] Bringing computation to the data: A MOEA-driven approach for optimising data processing in the context of the SKA and SRCNet

Published: 2 weeks ago (January 5, 2026 at 05:35 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.01980v1

Overview

The paper tackles one of the biggest data‑processing hurdles facing the Square Kilometre Array (SKA): moving petabytes of raw telescope data across a global network of regional centres is becoming impossible. The authors propose a computation‑to‑data strategy that combines Function‑as‑a‑Service (FaaS) with a Multi‑Objective Evolutionary Algorithm (MOEA) to automatically decide where and how to run data‑intensive tasks, balancing speed, energy use, and data‑transfer costs.

Key Contributions

Hybrid FaaS + MOEA framework that dynamically generates near‑optimal execution plans for SKA data pipelines.
Multi‑objective formulation that simultaneously minimizes execution time and energy consumption while respecting data‑location constraints.
Prototype implementation integrated into the SKA Regional Centres Network (SRCNet) architecture, demonstrating in‑situ function deployment close to data sources.
Baseline performance evaluation showing up to 30 % reduction in end‑to‑end processing time and 20 % lower energy footprint compared with a centralized processing baseline.
Open‑source reference code and a reproducible experimental workflow for the broader scientific‑computing community.

Methodology

Problem Modeling – The data‑processing workflow is expressed as a directed acyclic graph (DAG) where nodes are lightweight functions (e.g., calibration, imaging) and edges represent data dependencies.
FaaS Layer – Each function is packaged as a container‑based FaaS unit that can be instantiated on any SRCNet node (edge, regional centre, or cloud). The FaaS runtime abstracts storage, networking, and scaling details from the optimizer.
Decision Engine – A Multi‑Objective Evolutionary Algorithm (specifically NSGA‑II) explores the huge combinatorial space of possible function placements and scheduling orders.
- Objectives: (i) total wall‑clock time, (ii) total energy consumption.
- Constraints: data‑locality (functions must run where required input resides), network bandwidth caps, and node‑specific resource limits.
Fitness Evaluation – For each candidate solution, a fast simulation model estimates execution time and energy based on historical profiling data of each function on each node type.
Selection & Deployment – The Pareto‑optimal solutions are presented to a lightweight orchestrator that picks the plan best matching the current service‑level agreement (e.g., prioritize latency during observation bursts). The chosen plan is then materialized by spawning the corresponding FaaS instances across the network.

Results & Findings

Metric	Centralised (baseline)	MOEA‑driven FaaS (best Pareto)
End‑to‑end processing time	1.00 × (reference)	0.70 × (≈30 % faster)
Energy consumption	1.00 × (reference)	0.80 × (≈20 % lower)
Data transferred over WAN	100 TB	45 TB (≈55 % reduction)
Scheduler overhead	–	< 2 % of total runtime

Key take‑aways

Moving computation to the data cuts WAN traffic dramatically, which in turn reduces both latency and energy spent on data movement.
The MOEA quickly converges (within a few hundred generations) to solutions that respect all constraints, making it viable for near‑real‑time re‑planning during observation campaigns.
The modular FaaS approach allows new processing steps to be added without re‑engineering the whole pipeline.

Practical Implications

For SKA developers: The framework offers a plug‑and‑play way to offload heavy calibration or imaging steps to the nearest edge node, freeing up central resources for other science cases.
For cloud/edge providers: Demonstrates a concrete use‑case for FaaS beyond typical web workloads, encouraging investment in low‑latency, high‑throughput edge compute platforms.
Energy‑aware scheduling: Operators can enforce greener operation policies (e.g., shift workloads to nodes powered by renewable energy) simply by adjusting the MOEA’s objective weights.
Scalable workflow orchestration: The approach can be generalized to other exascale science projects (e.g., climate modelling, genomics) that face similar data‑movement bottlenecks.
Developer tooling: The open‑source prototype includes a Python SDK for defining DAGs and custom cost models, lowering the barrier for integrating existing SKA pipelines.

Limitations & Future Work

Simulation fidelity: The current fitness evaluator relies on profiled averages; real‑world variability (e.g., network jitter, node contention) may affect optimality.
Scalability of the MOEA: While effective for the tested DAG sizes (≈50 functions), larger pipelines may require hierarchical or surrogate‑based optimization to keep runtime low.
Security & data governance: Deploying functions across heterogeneous sites raises access‑control challenges that are not fully addressed.
Future directions: The authors plan to (1) integrate online learning to refine cost models on‑the‑fly, (2) explore hybrid meta‑heuristics (e.g., MOEA + reinforcement learning) for faster convergence, and (3) conduct a full‑scale pilot on the operational SRCNet testbed.

Authors

Manuel Parra‑Royón
Álvaro Rodríguez‑Gallardo
Susana Sánchez‑Expósito
Laura Darriba‑Pol
Jesús Sánchez‑Castañeda
M. Ángeles Mendoza
Julián Garrido
Javier Moldón
Lourdes Verdes‑Montenegro

Paper Information

arXiv ID: 2601.01980v1
Categories: cs.DC
Published: January 5, 2026
PDF: Download PDF

[Paper] Bringing computation to the data: A MOEA-driven approach for optimising data processing in the context of the SKA and SRCNet

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Space-Optimal, Computation-Optimal, Topology-Agnostic, Throughput-Scalable Causal Delivery through Hybrid Buffering

[Paper] Konflux: Optimized Function Fusion for Serverless Applications

[Paper] AFLL: Real-time Load Stabilization for MMO Game Servers Based on Circular Causality Learning

[Paper] Breaking the Storage-Bandwidth Tradeoff in Distributed Storage with Quantum Entanglement