The Day We Hardcoded 42 in the Treasure Hunt Engine
Source: Dev.to
The Problem We Were Actually Solving
We built the Veltrix treasure‑hunt engine to power a live‑event platform where thousands of users raced to solve puzzles in real time. The configuration layer was supposed to be the secret weapon that let us grow confidently.
What we didn’t account for was that our first stab at configuration was just a Ruby hash that lived in the codebase, user‑facing values shoved into environment variables, and a single YAML file that grew to the size of Manhattan by launch week.
The day we pushed to production, the biggest problem wasn’t scale — it was that every change required a restart, because changes to the config forced the Ruby process to recompile constants.
At 2:17 a.m., the first growth inflection hit:
- 1,024 concurrent users
- 30 seconds of garbage‑collection pauses
- Redis connection pool exhausted because the config parser ballooned to 15 MB
The system didn’t stall under load — it stalled under configuration.
What We Tried First (And Why It Failed)
1. Environment variables & the Twelve‑Factor App checklist
- Eleven separate
.envfiles, Docker‑Compose overrides, and a CI pipeline that injected values at build time. - The illusion of clean separation lasted exactly one sprint.
By sprint two we had 170 environment variables (half secrets), scattered across three repos because:
- Product wanted feature flags
- Ops wanted tuning parameters
- Marketing wanted A/B splits
We burned 16 engineering hours debugging why a Redis cluster in staging accepted connections but rejected commands — it turned out the staging environment had inherited a production database name because an engineer copy‑pasted a .env.example and forgot to change one letter.
2. Consul as a dynamic configuration backend
- Powerful at first, until we realized every config change triggered a rolling restart of the entire fleet because the Ruby process couldn’t reload anything without nuking its constant cache.
- Introduced a new failure domain: if Consul’s leader died, the treasure‑hunt engine paused mid‑puzzle and waited for a re‑election, which often happened during a US‑East outage while US‑West traffic peaked.
3. Monorepo approach with a dedicated config service
- Every team contributed their own YAML files.
- Merge conflicts in config files started breaking production, and a typo in a YAML anchor brought down the entire event for 23 minutes.
“
config.yaml:32: found character that cannot start any token” – a Slack message I still have.
The Architecture Decision
We stopped trying to make configuration dynamic and started making it disposable.
- Replaced the Ruby constants with a lightweight Lua sandbox that runs inside Redis.
- Every configuration value became a Redis key with a TTL equal to the cache‑flush interval.
- Each worker process loads its config on every request via a Lua call.
The key insight wasn’t performance — it was that Redis already provides:
- A network protocol
- Persistence
- A built‑in failure detector
We didn’t need Consul or Kubernetes ConfigMaps; we needed a fast reload and a single source of truth.
Trade‑offs
- Configuration is now a first‑class citizen in the Redis cluster. If Redis goes down, the treasure hunt goes down.
- In practice Redis is far more stable than our previous approach, and we can now push configuration changes without restarting anything.
- We gained atomicity: every config value has a versioned key, so we can roll back by deleting the latest version and letting workers reload.
What the Numbers Said After
| Metric | Before | After |
|---|---|---|
| P99 latency (2,000 concurrent users) | 800 ms | 240 ms |
| GC pause duration | 30 s | < 200 ms |
| Redis memory overhead | – | + 18 MB (acceptable) |
| Config reloads (Black Friday) | – | 42 reloads / s (the “42” version became a running joke) |
We instrumented the Lua sandbox with a simple Prometheus metric: veltrix_config_reloads_total.
What I Would Do Differently
- Treat the configuration layer as an infrastructure primitive, not a code layer.
- Embed it in the platform runtime, version it, and never expose raw key‑value pairs to engineers.
- Start with a Lua sandbox from day one and skip Ruby constants entirely.
- Ban any configuration value that can’t be represented as a Lua table with a TTL, including feature flags.
- Insist that every change be versioned and that roll‑backs be a first‑class operation.
— End of post
Environment variables must be encrypted at rest and audited weekly, because the real failure domain wasn’t Redis—it was the people who thought environment variables were a form of version control. And finally, I’d never again let a product manager name a config version 42 without a formal change record. That number cursed us for months.
