What I Learned Building a Job Scheduler That Doesn’t Trust Redis
Source: Dev.to

Intro
What it is?
Tickr is a background job scheduler built using Go, Redis, and MySQL.
It consists of a pool of workers waiting to be assigned jobs. Jobs move through waiting and ready queues implemented using Redis, and are assigned to workers by a scheduler. Everything runs concurrently and is event‑driven.
Why did I build it?
I was getting tired of building CRUD backend APIs in Node.js and Go. I wanted to work on something different, something unfamiliar, that would force me to think about concurrency, failure handling, and system behavior instead of just endpoints and schemas.
What broke in v1?
Polling
In v1, the scheduler polled every second to check if a job was ready to be executed. It worked but was inefficient. I replaced polling with unbuffered Go channels and blocking operations—channels became one of the most important concepts in the project.
Assumptions
I assumed many things would just work. When those assumptions broke, debugging and recovery became difficult.
Example: if the server terminated while jobs were still in the waiting queue, recovering those jobs on restart was messy and unreliable.
Redis as single source of truth
Initially I relied on Redis as the only source of truth. Edge‑case testing raised uncomfortable questions:
- What happens if Redis crashes while jobs are still queued?
- What if Redis loses its state after a crash?
- What if a job fails midway?
- What if I need logs or history for completed jobs?
Answering these required a redesign, leading to Tickr v2.
v2 Architecture
MySQL as the source of truth
In v2, MySQL stores the full job data. Only the JobID and scheduling information are pushed to Redis, which is used purely for scheduling and coordination, not durability.

This approach:
- Prevents job loss during crashes
- Avoids unnecessary MySQL polling
- Allows jobs to move across queues using lightweight IDs
- Lets workers fetch the full job data from MySQL only when execution is needed
Scheduler and Workers
The scheduler runs a single goroutine (PopReadyQueue) that blocks on Redis until a job becomes available.
If Redis disconnects:
- The scheduler waits until Redis is reachable again
- Checks whether Redis lost its state
- If state is lost, it rebuilds queues from MySQL
Jobs popped from the ready queue are sent into an unbuffered job channel that all workers listen to. When a job appears on the channel, exactly one worker picks it up and executes it.
If a job fails:
- It is retried with a delay
- Retries are bounded (max 3 attempts)
- Delay increases linearly with each attempt
Edge Cases Handled
Redis Crash
When Redis goes down, the scheduler pauses and waits until Redis becomes available again. Once Redis is back, it checks whether state was lost and triggers recovery from MySQL if required; otherwise it continues normally.
Overdue jobs
If Redis is down while jobs are scheduled, their execution time may pass without being triggered. Previously these jobs wouldn’t run until a new job entered the queue. I fixed this by signaling the scheduler through a channel when the waiting queue is refilled, forcing an immediate re‑evaluation of overdue jobs.
Shutdown during execution
If the server shuts down while jobs are executing:
- Workers stop accepting new jobs
- In‑flight jobs are allowed to complete
- The program waits for workers to finish before exiting
A problem arose where completed jobs couldn’t update their status in MySQL because the global context was already cancelled. I solved this by using a background context specifically for final status updates, ensuring correctness even during shutdown.
What I learned
This project was very different from anything I had built before, and I enjoyed it a lot. I learned:
- Why ownership of state matters
- How fragile assumptions can be
- How to reason about failure instead of avoiding it
- How to design systems that recover instead of simply restarting
- How to write backend code that doesn’t break easily under edge cases
Tickr v2 taught me more about backend systems than any tutorial ever did.
The GitHub repository is available here: Tickr