I used Gossip Glomers to learn distributed systems from zero (and got humbled fast)
Source: Dev.to

I started Fly.io’s Gossip Glomers because I wanted a practical way into distributed systems. Books were useful, but I wasn’t feeling the problems. Gossip Glomers gave me tiny problems that looked simple, then failed in very non‑obvious ways. I’m still early in this journey, but here are the lessons that finally clicked for me.
What I built
I solved the challenges in Go:
- Echo
- Broadcast
- G‑Counter
- Unique IDs
- Kafka‑style log
- Transactional key‑value (eventually consistent sync)
My stack was intentionally boring: Go + Maelstrom’s Go node library + JSON handlers.
Code
Aha #1: “Works locally” means nothing without retries + idempotency
Broadcast was my first real slap in the face. My initial thought was:
“Receive message → forward to neighbors → done.”
Then I realized:
- messages can be duplicated
- RPCs can fail
- peers can miss updates
- clients can race with propagation
The turning point was adding explicit duplicate detection:
mu.Lock()
alreadySeen := false
for _, v := range message_list {
if v == req.Message {
alreadySeen = true
break
}
}
if !alreadySeen {
message_list = append(message_list, req.Message)
}
mu.Unlock()
if alreadySeen {
return n.Reply(msg, BroadcastResponse{Type: "broadcast_ok"})
}Why this mattered
- The
alreadySeencheck makes rebroadcast safe. - Retries no longer corrupt state.
- “At‑least‑once delivery” becomes manageable because handlers are idempotent.
Takeaway: Retries are useless unless duplicate handling is correct.
Aha #2: CAS loops are the backbone of safe shared updates
In the G‑Counter and Kafka‑style log challenges I used compare‑and‑swap loops:
for {
curr, err := kv.ReadInt(context.Background(), key)
if keyMissing(err) {
curr = 0
} else if err != nil {
return err
}
next := curr + req.Delta
err = kv.CompareAndSwap(context.Background(), key, curr, next, true)
if err == nil {
break
}
}Why it works
- Read the current value.
- Compute the new value.
- Write only if nobody changed it since the read.
- Retry on contention.
Takeaway: Concurrency bugs are not fixed by optimism; they’re fixed by atomicity + retry.
Aha #3: Topology is not an implementation detail
I used neighbor forwarding in broadcast and deliberately skipped sending back to the source. Even that small decision noticeably reduces message noise.
Trade‑offs
- More fan‑out → faster propagation, higher network traffic.
- Less fan‑out → cheaper traffic, higher risk of staleness.
Before this challenge topology felt theoretical; now it’s a direct lever on latency and cost.
Aha #4: Consistency model changes everything you’re allowed to do
For the transaction challenge I used local writes plus periodic state sync:
for _, txn := range req.Txn {
op, key := txn[0].(string), int(txn[1].(float64))
switch op {
case "r":
txn[2] = readLocal(store, key)
case "w":
store[key] = int(txn[2].(float64))
}
}And the sync step:
for k, v := range req.State {
if currVal, exists := store[k]; !exists || v > currVal {
store[k] = v
}
}This approach yields high availability and eventual convergence, but it is not strictly serializable. The lesson: your merge strategy defines your guarantees. Consistency labels are no longer abstract terms; they have concrete implementation consequences.
Things I messed up (so you don’t have to)
- Underestimated how often duplicate messages appear.
- Treated network failures as exceptional rather than normal flow.
- Used a slice for deduplication in broadcast (fine early, but not scalable).
- Relied on “read‑then‑write” without CAS, creating race conditions.
- Replied too early in some flows, leading to visibility/staleness issues.
What I’d improve next
- Replace the linear deduplication scan with a
map[int]struct{}in broadcast. - Add bounded retry/backoff instead of hot retry loops.
- Make transaction merge semantics explicit (e.g., version vectors, timestamps, or CRDT‑style merges depending on workload).
- Capture and compare Maelstrom result artifacts more systematically between iterations.
Why this challenge was perfect for a beginner like me
Gossip Glomers gave me small, runnable problems where each “tiny” bug taught a core distributed‑systems rule—not by theory first, but by breaking first. That worked really well for me.
If you’ve done Gossip Glomers too: which challenge changed how you think the most — broadcast, counters, or txn?