TIL: Byzantine Generals Problem in Real-World Distributed Systems

Published: (January 11, 2026 at 11:02 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

Preface

When learning the Raft algorithm, the Byzantine Failure is usually excluded. Unexpectedly, CloudFlare’s incident report last November used the real‑world Byzantine problem as the title. I’ll use this to organize some thoughts.

What is Byzantine failure

In a distributed system, different computers communicate with each other as a consensus communication data‑confirmation process. It requires computers to report what they are going to do or to vote for a leader.

If a computer tells some members A one thing and another group of members B something else, causing the entire group to fail to reach consensus or reach an unexpected state, it is called a Byzantine Failure.

Many consensus algorithms such as Paxos and Raft initially assume that Byzantine failures do not exist because handling them raises the complexity of consensus to another level.

Reference articles

About CloudFlare’s recovery mechanism

Before exploring more complex issues, there is an interesting angle in CloudFlare’s incident report: how they view their backup mechanisms for system maintenance.

Service backup mechanism

  • Each service is a series of rack servers.
  • Each machine has two switches.
  • Each rack has two or more power‑supply devices.
  • Each server uses a RAID‑10 backup mechanism (RAID 1 + RAID 0).
  • Each rack contains at least three machines.

The problem that occurred

Image explanation: Top left is Server 1, top right is Server 2, and below is Server 3, which is also the Leader.

  • A network problem between Server 1 and Server 2 caused them to have inconsistent information.
    • Server 1 believed the Leader (Server 3) was offline.
    • Server 2 believed the Leader was running normally.
  • This inconsistency is why CloudFlare labeled the incident a Byzantine Failure.

Reference

Back to Blog

Related posts

Read more »

Hello, Newbie Here.

Hi! I'm falling back into the realm of S.T.E.M. I enjoy learning about energy systems, science, technology, engineering, and math as well. One of the projects I...