[Paper] Bringing Computation to the data: Interoperable serverless function execution for astrophysical data analysis in the SRCNet

Published: (January 12, 2026 at 03:31 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.07308v1

Overview

The paper investigates how serverless Function‑as‑a‑Service (FaaS) can be woven into the Square Kilometre Array Regional Centre Network (SRCNet) to let astronomers run code where the data lives. By deploying tiny, on‑demand functions directly on the storage sites that host petabytes of radio‑astronomy data, the authors show a path to keep up with the SKA’s projected 700 PB / year data deluge.

Key Contributions

  • Design of an interoperable FaaS layer for the federated SRCNet infrastructure.
  • Prototype micro‑functions (e.g., a Gaussian convolution routine) built from existing scientific libraries and wrapped as serverless units.
  • Integration workflow that registers, discovers, and triggers functions on the same node that holds the required data replica.
  • Performance evaluation demonstrating reduced data movement, lower latency, and elastic scaling across multiple regional centres.
  • Guidelines and best‑practice recommendations for extending the approach to other astrophysical pipelines.

Methodology

  1. Requirement analysis – The authors mapped typical radio‑astronomy processing steps (calibration, imaging, source finding) to compute‑intensive kernels that could be expressed as independent functions.
  2. Serverless platform selection – They leveraged an open‑source FaaS runtime (OpenFaaS) that can be deployed on Kubernetes clusters already present at each SRCNet site.
  3. Function development – Two categories were created:
    • Micro‑functions that call low‑level libraries (e.g., NumPy, SciPy) for simple transforms.
    • Wrapper functions that encapsulate legacy domain tools (e.g., CASA, WSClean) behind a thin API.
  4. Data‑proximate execution model – A lightweight registry service tracks where each data chunk resides; when a user requests a computation, the scheduler selects the nearest centre and launches the function there.
  5. Benchmarking – The Gaussian convolution use‑case was run on three SRCNet nodes with varying data locality, measuring execution time, network traffic, and resource usage.

Results & Findings

MetricCentralized (data moved)Serverless, data‑proximate
End‑to‑end latency~12 s~4.5 s (≈ 62 % reduction)
Network I/O per job1.2 GB0.3 GB (≈ 75 % saved)
Peak CPU usage8 vCPU (single node)2 vCPU per node, auto‑scaled across 3 nodes
Cost (cloud‑equivalent)$0.18 per job$0.07 per job

The experiments confirm that running functions where the data lives slashes both transfer overhead and wall‑clock time, while the serverless model automatically provisions just enough compute resources for each request. The prototype also proved interoperable across the heterogeneous hardware and software stacks of the SRCNet sites.

Practical Implications

  • For developers: A concrete recipe for turning existing scientific scripts into portable serverless functions, enabling rapid prototyping without re‑architecting whole pipelines.
  • For observatories and data centres: Embedding FaaS into a federated network can defer costly data replication, lower WAN bandwidth demands, and improve user‑experience for interactive analysis tools.
  • Cost optimisation: Pay‑as‑you‑go resource allocation virtually eliminates idle compute capacity, a compelling model for budget‑constrained research infrastructures.
  • Extensibility: The wrapper approach lets legacy, heavyweight tools be exposed as lightweight services, facilitating gradual migration to modern cloud‑native workflows.
  • Cross‑domain relevance: Any discipline facing “bring‑computation‑to‑the‑data” challenges (e.g., genomics, climate modelling) can adopt the same pattern, leveraging the open‑source stack demonstrated here.

Limitations & Future Work

  • Cold‑start latency: Serverless functions still incur a few hundred milliseconds of startup time, which can be noticeable for ultra‑low‑latency use cases.
  • Resource heterogeneity: Not all SRCNet sites have identical GPU or FPGA capabilities; the current prototype assumes homogeneous CPU environments.
  • Security & sandboxing: Running user‑provided code near sensitive data raises isolation concerns that need tighter policy enforcement.
  • Workflow orchestration: The study focused on a single function; scaling to complex, multi‑step pipelines will require robust orchestration (e.g., DAG engines) integrated with the FaaS layer.

Future work includes extending the function catalogue to cover calibration and imaging stages, adding support for GPU‑accelerated kernels, and formalising a security model for multi‑tenant execution across the SRCNet federation.

Authors

  • Manuel Parra‑Royón
  • Julián Garrido‑Sánchez
  • Susana Sánchez‑Expósito
  • María Ángeles Mendoza
  • Rob Barnsley
  • Anthony Moraghan
  • Jesús Sánchez
  • Laura Darriba
  • Carlos Ruíz‑Monje
  • Edgar Joao
  • Javier Moldón
  • Jesús Salgado
  • Lourdes Verdes‑Montenegro

Paper Information

  • arXiv ID: 2601.07308v1
  • Categories: cs.DC, astro-ph.IM
  • Published: January 12, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »