AWS re:Invent 2025 - Architecting resilient multicloud operations, feat. Monzo Bank (HMC201)

Published: (December 6, 2025 at 07:39 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Overview

AWS Principal Technologists Clark Richey and Bruno Emer, together with Monzo Bank’s Andrew Lawson, discuss strategies for building resilient multi‑cloud operations. They introduce the SEEMS framework for analyzing failure modes and share Monzo’s “Stand‑in” platform—a lightweight, cost‑effective backup system that runs on Google Cloud while the primary platform operates on AWS.

The SEEMS Framework

AcronymFailure ModeDescription
S – Single points of failureComponents whose loss would bring down the entire system.
E – Excessive loadSituations where traffic overwhelms a service, causing degradation or outage.
E – Excessive latencyHigh response times that can break user experiences or downstream dependencies.
M – Misconfiguration / bugsHuman errors or software defects that introduce instability.
S – Shared fateResources that are tightly coupled across clouds, creating cascading failures.

The framework helps teams systematically evaluate where resilience can be improved across multiple cloud providers.

Multi‑Cloud Resilience: Myths & Realities

  • Complexity vs. Resilience – Adding cloud providers increases architectural complexity, which can reduce resilience if not managed carefully.
  • When Multi‑Cloud Helps – It is valuable for specific scenarios such as:
    • Disaster recovery (DR) where regulatory or data‑sovereignty rules prevent data from leaving a country.
    • Situations requiring geographic redundancy beyond a single provider’s regions.

Multi‑cloud is not a blanket solution; it must be applied intentionally with clear goals.

Monzo’s “Stand‑in” Platform

  • Purpose – Acts as a lifeboat strategy: a simplified banking system that can take over core transaction processing if the primary AWS environment fails.
  • Architecture – Runs on Google Cloud, mirroring essential services of Monzo’s main platform.
  • Cost – Operates at roughly 1 % of the primary platform’s cost.
  • Production Use – Processes real customer transactions daily for testing and has been used successfully during actual incidents.

Best Practices for Resilient Multi‑Cloud Operations

Fault Isolation

  • Keep clear boundaries between providers to avoid shared‑fate failures.
  • Use separate VPCs, IAM roles, and networking configurations per cloud.

Observability

  • Implement unified logging, metrics, and tracing that span all clouds.
  • Ensure alerts surface provider‑specific issues as well as cross‑cloud dependencies.

Comprehensive Testing

  • Conduct regular chaos engineering experiments that simulate provider outages, network partitions, and latency spikes.
  • Validate DR runbooks by actually failing over workloads between clouds.

Critical Dependency Management

  • Avoid single points of failure for services such as DNS, authentication, and configuration stores.
  • Deploy redundant instances of these services in each cloud, or use globally distributed solutions.

Conclusion

Multi‑cloud can enhance resilience when applied to well‑defined problems like regulatory‑driven DR or geographic redundancy. The SEEMS framework provides a structured way to identify and mitigate failure modes. Monzo’s “Stand‑in” platform demonstrates that a lightweight, cost‑effective backup environment can operate in production and serve as a reliable safety net. By enforcing fault isolation, maintaining robust observability, rigorously testing, and eliminating single points of failure, organizations can reap the benefits of multi‑cloud without succumbing to its added complexity.

Back to Blog

Related posts

Read more »