[Paper] Failure-Resilient and Carbon-Efficient Deployment of Microservices over the Cloud-Edge Continuum

Published: (January 7, 2026 at 12:38 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.04123v1

Overview

The paper presents FREEDA, a toolchain that automatically deploys and continuously re‑optimises microservice‑based applications across the Cloud‑Edge continuum. By jointly considering failure‑resilience, performance, and carbon‑efficiency, FREEDA aims to keep services up‑and‑running while cutting the carbon footprint of the underlying infrastructure.

Key Contributions

  • FREEDA toolchain: end‑to‑end solution that monitors runtime conditions, predicts failures, and re‑configures deployments in real time.
  • Carbon‑aware placement algorithm: selects compute “flavours” and edge‑cloud locations based on instantaneous carbon intensity data.
  • Adaptive migration & scaling: automatically migrates services, adjusts replica counts, and re‑balances workloads when resources become scarce or nodes fail.
  • Experimental suite: a collection of simulated and emulated scenarios (resource exhaustion, node crashes, carbon‑intensity spikes) that benchmark the toolchain against realistic edge‑cloud workloads.
  • Empirical validation: demonstrates that FREEDA can maintain service‑level objectives while reducing carbon emissions by up to 30 % compared with static deployments.

Methodology

  1. Modeling the deployment space – Each microservice is described by a set of flavours (CPU, memory, storage) and a list of admissible execution sites (cloud data centre, edge node, or hybrid).
  2. Monitoring layer – Lightweight agents collect metrics on resource utilisation, failure events, and local carbon intensity (e.g., from grid APIs).
  3. Decision engine – A multi‑objective optimisation routine (Pareto‑front based) evaluates trade‑offs between resilience (e.g., redundancy, latency), performance (throughput, response time), and carbon cost.
  4. Reconfiguration actions – Depending on the optimiser’s output, FREEDA triggers one or more of the following:
    • Service migration to a greener or more reliable node,
    • Flavour scaling (up/down) to match current load,
    • Workload re‑balancing across replicas.
  5. Continuous loop – The cycle repeats every few seconds to minutes, allowing the system to react to dynamic conditions without human intervention.

Results & Findings

ScenarioResilience (downtime)Avg. Response TimeCarbon Reduction
Baseline static placement12 % downtime210 ms
FREEDA under resource exhaustion< 2 % downtime180 ms22 %
FREEDA with carbon spikes (grid intensity ↑ 40 %)0 % downtime (service migrated)190 ms30 %
FREEDA in mixed edge‑cloud workload1 % downtime175 ms27 %

The data show that FREEDA can maintain or improve QoS while cutting carbon emissions by a substantial margin. The toolchain’s autonomous migrations and flavour adjustments keep services alive even when edge nodes fail or become overloaded.

Practical Implications

  • DevOps pipelines can plug FREEDA into CI/CD to automatically generate carbon‑aware deployment manifests (e.g., Helm charts) that evolve after release.
  • Edge‑first applications (IoT analytics, AR/VR, autonomous vehicles) gain a safety net: if an edge gateway goes down, FREEDA seamlessly shifts workloads to a nearby cloud region without breaking SLAs.
  • Sustainability dashboards can expose real‑time carbon savings to stakeholders, helping organisations meet ESG (Environmental, Social, Governance) targets and potentially lower operating costs (many cloud providers price greener regions cheaper).
  • Multi‑tenant platforms can use FREEDA’s optimisation engine to balance tenant isolation requirements with global carbon budgets, enabling “green‑as‑a‑service” offerings.
  • Serverless/Function‑as‑a‑Service runtimes could adopt the same principles to decide where to spin up containers based on current grid emissions, making serverless truly “green”.

Limitations & Future Work

  • Carbon data quality: FREEDA relies on timely, accurate carbon‑intensity feeds; noisy or delayed data can lead to sub‑optimal placements.
  • Overhead: Continuous monitoring and optimisation introduce modest CPU and network overhead, which may be non‑trivial for ultra‑lightweight edge devices.
  • Scope of platforms: The prototype targets Kubernetes‑based clusters; extending support to other orchestrators (Docker Swarm, Nomad) is left for later work.
  • Real‑world trials: Experiments were conducted in simulated/emulated environments; large‑scale field deployments are needed to validate robustness under production traffic patterns.
  • Future directions: integrating predictive AI models for failure and carbon trends, supporting multi‑cloud cost‑carbon trade‑offs, and exposing a declarative policy language for developers to express custom resilience or sustainability goals.

Authors

  • Francisco Ponce
  • Simone Gazza
  • Andrea D’Iapico
  • Roberto Amadini
  • Antonio Brogi
  • Stefano Forti
  • Saverio Giallorenzo
  • Pierluigi Plebani
  • Davide Usai
  • Monica Vitali
  • Gianluigi Zavattaro
  • Jacopo Soldani

Paper Information

  • arXiv ID: 2601.04123v1
  • Categories: cs.DC
  • Published: January 7, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »