Scaling with Celery and Redis
Source: Dev.to
The Problem with Synchronous Indexing
When a user started indexing a YouTube playlist, the backend ran a synchronous loop that processed each video one after another.
Fetching and indexing a transcript takes ~1.5 seconds, so a playlist with 1,000 videos would need about 25 minutes. The browser timed out after ~60 seconds, the UI froze, and the app eventually crashed.
Why Task Queues and Background Workers?
A fully synchronous design blocks the main request‑response cycle:
- Long‑running operations hold up the web server thread.
- Requests pile up, latency grows, and the system can drop connections or crash.
Task queues move work out of the request path:
- The HTTP handler records the intent and places a job in a queue.
- It immediately returns an acknowledgment to the client.
- Separate worker processes consume tasks from the queue and execute them in parallel (or concurrently).
Benefits
- The web tier stays responsive.
- Scaling is as simple as adding more workers.
- Failures in a worker don’t cascade to the whole system.
- Rate‑limiting can be applied at the queue level (useful for avoiding YouTube IP bans).
Core Concepts
Task
A single unit of work. In this project the task is process_video_task, which:
- Receives a video ID.
- Fetches the transcript via a proxy.
- Indexes the data into Elasticsearch.
Producer
The part of the application that creates tasks. Here, a Flask endpoint gathers playlist information and sends the job to Celery, instantly off‑loading the work.
Broker
Software that stores tasks and mediates between producers and consumers.
Redis is used both as the message broker and as a store for tracking task status, enabling real‑time progress updates.
Consumer (Worker)
Processes that pull tasks from the broker and execute them. Celery manages worker processes, memory, acknowledgments, and retries.
Implementation Details
Celery Task (Python)
# tasks.py
from celery import Celery
app = Celery('youtube_indexer', broker='redis://localhost:6379/0')
@app.task(bind=True, max_retries=3)
def process_video_task(self, video_data, index_name):
try:
transcript = get_video_transcript(video_data['id'])
if index_video(index_name, video_data, transcript):
return (video_data['id'], True)
except Exception as e:
# Celery will handle retries based on max_retries
raise self.retry(exc=e, countdown=60)
Storing the Task ID in Redis
# utils.py
TASK_KEY_PREFIX = "playlist_task:"
task_id_key = f"{TASK_KEY_PREFIX}{playlist_id}"
redis_conn.set(task_id_key, task.id, ex=7200) # expires in 2 hours
Producer Endpoint (Flask)
# routes.py
from flask import Blueprint, request, jsonify
from .tasks import process_video_task
bp = Blueprint('index', __name__)
@bp.post('/index_playlist')
def index_playlist():
data = request.json
playlist_id = data['playlist_id']
videos = get_videos_from_playlist(playlist_id)
for video in videos:
process_video_task.delay(video, index_name='my_index')
return jsonify({"status": "queued", "playlist_id": playlist_id})
Alternative Language Ecosystems
| Language | Queue Library | Broker |
|---|---|---|
| Node.js | BullMQ | Redis |
| Go | Asynq | Redis |
| Python | Celery | Redis (or RabbitMQ, etc.) |
The roles (producer, broker, consumer) remain the same across ecosystems.
Architecture Pattern: Fan‑Out (Orchestrator)
The system follows the Fan‑Out / Orchestrator pattern:
- Manager (the Flask endpoint) creates a single orchestration task, e.g.,
index_playlist_task. - This task dispatches many
process_video_taskjobs—one per video. - Workers fan out to process videos concurrently.
This separation keeps the initial request lightweight while leveraging parallelism for the heavy lifting.
Takeaways
- Off‑loading long‑running work to a task queue prevents request‑timeouts and keeps the UI responsive.
- Redis provides a fast, in‑memory broker; Celery adds the orchestration layer needed for retries, monitoring, and worker management.
- The same principles apply across languages—swap out the queue library but keep the producer‑broker‑consumer model.
By restructuring the app around Celery and Redis, indexing large playlists became reliable, scalable, and user‑friendly.