From Celery/Redis to Temporal: A Journey Toward Idempotency and Reliable Workflows
Source: Dev.to
When handling asynchronous tasks in distributed systems, the combination of Celery and Redis is often the go‑to choice. I also chose Celery for the initial design of my KYC (Know Your Customer) orchestrator due to its familiarity. However, as the service grew in complexity, I hit a massive wall: guaranteeing idempotency and managing complex states.
While Celery is excellent for “fire‑and‑forget” tasks, there’s a high risk of duplicate execution during retries caused by network failures or worker downs. For GPU‑intensive face‑recognition tasks, duplicate execution was costly and hurt performance.
The KYC Process
- User uploads an ID card image.
- User uploads a selfie video.
- Compare face similarity once both files exist.
In a Celery environment, because I didn’t know when images and videos would be uploaded, I needed complex logic to query the DB every time or store intermediate states in Redis. The “are all files collected?” check was scattered across multiple places, making maintenance difficult.
Why Temporal?
Temporal is not just a message queue; it’s a stateful workflow engine. Workflow code must be replay‑safe: for the same input and history it must produce the same sequence of workflow API calls. Consequently, side‑effects such as network I/O, file I/O, system time, randomness, or threading must be moved to activities, while the workflow itself focuses on orchestration.
Official docs:
Core Logic of FaceSimilarityWorkflow
# workflow.py
from datetime import timedelta
from temporalio import workflow
from temporalio.common import RetryPolicy
@workflow.run
async def run(self, data: SimilarityData) -> SimilarityResult:
# Wait up to 1 hour until both image and video are collected
await workflow.wait_condition(
lambda: any(f["type"] == "image" for f in self._files)
and any(f["type"] == "video" for f in self._files),
timeout=timedelta(hours=1),
)
# Execute GPU activity once all files are ready
result = await workflow.execute_activity(
check_face_similarity_activity,
data,
retry_policy=RetryPolicy(maximum_attempts=3),
)
return result
workflow.wait_condition suspends the workflow until the condition is met without blocking the event loop—something that would have required complex polling or webhook logic in Celery.
Idempotency at the Activity Level
Temporal records workflow progress as an event history, so a worker restart resumes exactly from the last successful point. Activities, however, follow an at‑least‑once execution model: an activity may be retried if a worker crashes after completing the work but before notifying the server. The official docs therefore strongly recommend making activities idempotent.
Official docs:
Double‑Defense Strategy
- External idempotency key – combine the workflow execution ID and activity ID.
- Internal guard – use a unique key (or check for existing results) in the database to prevent duplicate storage/processing.
# activities.py
from temporalio import activity
@activity.defn
async def check_face_similarity_activity(data: SimilarityData) -> SimilarityResult:
info = activity.info()
idempotency_key = f"{info.workflow_run_id}-{info.activity_id}"
session_id = data["session_id"]
with get_db_context() as db:
existing = (
db.query(FaceSimilarity)
.filter(FaceSimilarity.idempotency_key == idempotency_key)
.first()
)
if existing:
return SimilarityResult(success=True, message="Already processed.")
# Perform actual GPU‑intensive work...
# (store result with the same idempotency_key)
Comparison: Celery/Redis vs. Temporal
| Feature | Celery/Redis | Temporal |
|---|---|---|
| State Management | Manual storage in DB/Redis | Automatically managed by engine |
| Retry Strategy | Manual exponential backoff | Declarative retry policy |
| Visibility | Must dig through logs | Inspect history in Temporal UI |
| Idempotency | Very difficult to guarantee | Structurally achievable |
Takeaway
The transition from Celery to Temporal was not just about swapping tools; it was about redefining how business processes are expressed in code. In financial and authentication systems where idempotency is paramount, Temporal provides irreplaceable stability.
If you’re losing sleep over complex asynchronous logic and idempotency issues, I strongly recommend migrating to Temporal.