I used Gossip Glomers to learn distributed systems from zero (and got humbled fast)

Published: (March 4, 2026 at 12:14 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Cover image for I used Gossip Glomers to learn distributed systems from zero (and got humbled fast)

I started Fly.io’s Gossip Glomers because I wanted a practical way into distributed systems. Books were useful, but I wasn’t feeling the problems. Gossip Glomers gave me tiny problems that looked simple, then failed in very non‑obvious ways. I’m still early in this journey, but here are the lessons that finally clicked for me.

What I built

I solved the challenges in Go:

  • Echo
  • Broadcast
  • G‑Counter
  • Unique IDs
  • Kafka‑style log
  • Transactional key‑value (eventually consistent sync)

My stack was intentionally boring: Go + Maelstrom’s Go node library + JSON handlers.
Code

Aha #1: “Works locally” means nothing without retries + idempotency

Broadcast was my first real slap in the face. My initial thought was:

“Receive message → forward to neighbors → done.”

Then I realized:

  • messages can be duplicated
  • RPCs can fail
  • peers can miss updates
  • clients can race with propagation

The turning point was adding explicit duplicate detection:

mu.Lock()
alreadySeen := false
for _, v := range message_list {
    if v == req.Message {
        alreadySeen = true
        break
    }
}
if !alreadySeen {
    message_list = append(message_list, req.Message)
}
mu.Unlock()

if alreadySeen {
    return n.Reply(msg, BroadcastResponse{Type: "broadcast_ok"})
}

Why this mattered

  • The alreadySeen check makes rebroadcast safe.
  • Retries no longer corrupt state.
  • “At‑least‑once delivery” becomes manageable because handlers are idempotent.

Takeaway: Retries are useless unless duplicate handling is correct.

Aha #2: CAS loops are the backbone of safe shared updates

In the G‑Counter and Kafka‑style log challenges I used compare‑and‑swap loops:

for {
    curr, err := kv.ReadInt(context.Background(), key)
    if keyMissing(err) {
        curr = 0
    } else if err != nil {
        return err
    }

    next := curr + req.Delta
    err = kv.CompareAndSwap(context.Background(), key, curr, next, true)
    if err == nil {
        break
    }
}

Why it works

  1. Read the current value.
  2. Compute the new value.
  3. Write only if nobody changed it since the read.
  4. Retry on contention.

Takeaway: Concurrency bugs are not fixed by optimism; they’re fixed by atomicity + retry.

Aha #3: Topology is not an implementation detail

I used neighbor forwarding in broadcast and deliberately skipped sending back to the source. Even that small decision noticeably reduces message noise.

Trade‑offs

  • More fan‑out → faster propagation, higher network traffic.
  • Less fan‑out → cheaper traffic, higher risk of staleness.

Before this challenge topology felt theoretical; now it’s a direct lever on latency and cost.

Aha #4: Consistency model changes everything you’re allowed to do

For the transaction challenge I used local writes plus periodic state sync:

for _, txn := range req.Txn {
    op, key := txn[0].(string), int(txn[1].(float64))
    switch op {
    case "r":
        txn[2] = readLocal(store, key)
    case "w":
        store[key] = int(txn[2].(float64))
    }
}

And the sync step:

for k, v := range req.State {
    if currVal, exists := store[k]; !exists || v > currVal {
        store[k] = v
    }
}

This approach yields high availability and eventual convergence, but it is not strictly serializable. The lesson: your merge strategy defines your guarantees. Consistency labels are no longer abstract terms; they have concrete implementation consequences.

Things I messed up (so you don’t have to)

  • Underestimated how often duplicate messages appear.
  • Treated network failures as exceptional rather than normal flow.
  • Used a slice for deduplication in broadcast (fine early, but not scalable).
  • Relied on “read‑then‑write” without CAS, creating race conditions.
  • Replied too early in some flows, leading to visibility/staleness issues.

What I’d improve next

  • Replace the linear deduplication scan with a map[int]struct{} in broadcast.
  • Add bounded retry/backoff instead of hot retry loops.
  • Make transaction merge semantics explicit (e.g., version vectors, timestamps, or CRDT‑style merges depending on workload).
  • Capture and compare Maelstrom result artifacts more systematically between iterations.

Why this challenge was perfect for a beginner like me

Gossip Glomers gave me small, runnable problems where each “tiny” bug taught a core distributed‑systems rule—not by theory first, but by breaking first. That worked really well for me.

If you’ve done Gossip Glomers too: which challenge changed how you think the most — broadcast, counters, or txn?

0 views
Back to Blog

Related posts

Read more »