Lambda Durable Functions: Building Workflows That Run for a Year
Source: Dev.to
The Problem We’ve All Been Ignoring
Think about the last time you built a multi‑step workflow:
- An order‑processing system that waits for payment confirmation.
- A content‑moderation pipeline with human‑review steps.
- A data pipeline that processes files uploaded by users throughout the day.
You probably reached for Step Functions, right? I did too—until I saw the bill.
Step Functions charge per state transition.
$25 per million transitions sounds cheap, but a six‑state approval workflow costs you every single time it runs, even while it’s just waiting for someone to click “Approve” in an email.
💡 The Real Cost of Waiting
| Workflow | Transitions per run | Runs / month | Cost / month |
|---|---|---|---|
| Approval (8 transitions) | 8 | 10 000 | $2.00 |
You’re paying for states that do nothing except wait. Lambda Durable Functions? $0.00 for the waiting time.
What Are Lambda Durable Functions, Anyway?
Lambda Durable Functions let you write long‑running workflows as regular code—no JSON state machines required. Write normal TypeScript or Python, and AWS handles:
- Orchestration
- State persistence
- Resumption after pauses
The magic is in the await statement. When your function awaits a durable task, AWS:
- Checkpoints the function’s state.
- Shuts it down.
- Restores it when the task completes—whether that’s 5 seconds later or 5 months later.
You don’t pay for the wait.
How Lambda Durable Functions Work
flowchart TD
A[Function Starts] --> B[Execute Code]
B --> C{Await Durable Task?}
C -- Yes --> D[Checkpoint State]
D --> E[Suspend Function]
E --> F[Wait for Event/Timer]
F --> G[Restore State]
G --> H[Resume Execution]
C -- No --> I[Continue]
I --> H
(If you prefer a simple text diagram, see below.)
Function Starts → Execute Code → Await Durable Task?
↓ ↓
Continue Checkpoint State
↓ ↓
Complete/Next Step Suspend Function
↓
Wait for Event/Timer
↓
Restore State
↓
Resume Execution
A Real Example: Document‑Approval Workflow
Below is a practical document‑approval system that:
- Waits for multiple reviewers.
- Sends reminders.
- Escalates if nobody responds.
In Step Functions this would be 15+ states with complex choice logic. In Durable Functions it’s just code.
import { DurableOrchestration } from '@aws-lambda/durable-functions';
export const documentApprovalWorkflow = new DurableOrchestration(
async (context) => {
const { documentId, reviewers } = context.input;
// 1️⃣ Send notification to all reviewers
await context.callActivity('sendReviewNotifications', {
documentId,
reviewers,
});
// 2️⃣ Wait for approvals with timeout (7 days)
const approvalTask = context.waitForEvent('approval', 7 * 24 * 60 * 60);
const reminderTask = context.createTimer(3 * 24 * 60 * 60); // 3 days
const winner = await Promise.race([approvalTask, reminderTask]);
if (winner === 'reminder') {
// Send reminder and wait again
await context.callActivity('sendReminderEmails', { reviewers });
const secondApproval = await context.waitForEvent('approval', 4 * 24 * 60 * 60);
if (!secondApproval) {
// Escalate to manager
await context.callActivity('escalateToManager', { documentId });
await context.waitForEvent('managerApproval', 2 * 24 * 60 * 60);
}
}
// 3️⃣ Process approval
const result = await context.callActivity('processApproval', {
documentId,
approvedAt: new Date().toISOString(),
});
return result;
}
);
// External system triggers approval
export const submitApproval = async (workflowId: string, decision: string) => {
await durableClient.raiseEvent(workflowId, 'approval', { decision });
};
Key takeaway: The code reads like a script you’d explain to a colleague—no JSON, no
$.decision == 'approved'conditions, just plain programming logic.
Multi‑Step Applications: The Sweet Spot
Durable Functions shine when you have multiple discrete steps, each potentially taking a different amount of time. Below are patterns that work incredibly well.
1️⃣ The Data‑Pipeline Pattern
You receive a file upload, process it through several transformations, wait for quality checks, then publish results. Each step may take seconds or hours depending on file size.

2️⃣ The Human‑in‑the‑Loop Pattern
This is where Durable Functions absolutely crush Step Functions. Any time you need to wait for a human decision—approvals, content moderation, manual data entry—Durable Functions let you:
- Pause execution without incurring cost.
- Send reminders or escalations automatically.
- Resume exactly where you left off once the human acts.
3️⃣ The Scheduled Batch Pattern
Process data in chunks throughout the day, aggregating results, and generating reports. Traditional cron jobs don’t maintain state between runs. Durable Functions do.
export const dailyReportWorkflow = new DurableOrchestration(
async (context) => {
const results = [];
// Process batches every 6 hours
for (let i = 0; i < 4; i++) {
const batchResult = await context.callActivity('processBatch', {
batchNumber: i,
timestamp: new Date()
});
results.push(batchResult);
// Wait 6 hours before next batch
if (i < 3) {
await context.createTimer(6 * 60 * 60);
}
}
// Generate final report with all batches
return await context.callActivity('generateReport', { results });
}
);
Lambda Durable Functions vs. Step Functions: The Honest Comparison
| Factor | Lambda Durable Functions | Step Functions (Standard) |
|---|---|---|
| Max Duration | 365 days | 365 days |
| Waiting Cost | $0 (state is persisted, function suspended) | Free after first 4,000 transitions/month |
| Execution Cost | Lambda pricing ($0.20 per 1 M requests) | $25 per 1 M state transitions |
| State Machine | Code‑based (TypeScript/Python) | JSON ASL (Amazon States Language) |
| Versioning | Built into code deployment | Manual version management |
| Testing | Standard unit tests, local debugging | Requires Step Functions Local or AWS |
| Visual Editor | None (code only) | Workflow Studio (drag‑and‑drop) |
| Error Handling | Try‑catch blocks | Retry policies in JSON |
Cost Breakdown Example
Scenario: Approval workflow with 8 steps, waiting an average of 48 h for human response, processing 50 000 documents per month.
Step Functions Cost
- 50 000 workflows × 8 state transitions = 400 000 transitions
- (400 000 – 4 000 free tier) × $0.000025 = $9.90 / month
Durable Functions Cost
- 50 000 workflows × 3 Lambda invocations (start, resume, complete) = 150 000 requests
- 150 000 × $0.0000002 = $0.03 / month
Savings: 99.7 % for workflows with long wait times.
When NOT to Use Durable Functions
Durable Functions are great, but there are cases where Step Functions still win:
- You need a visual workflow editor – non‑technical stakeholders appreciate Step Functions’ Workflow Studio.
- Heavy parallel processing – the Map state in Step Functions handles 10 000+ parallel branches elegantly.
- AWS service integrations – Step Functions offers 220+ direct integrations; Durable Functions require custom code.
- Compliance requirements – visual audit trails are easier to produce with Step Functions’ execution history.
Getting Started: Your First Durable Function
Using the AWS SAM template
sam init --runtime nodejs20.x --app-template durable-function
cd my-durable-app
sam build && sam deploy --guided
Or deploy with CDK
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as durable from '@aws-cdk/aws-lambda-durable-functions';
export class DurableStack extends cdk.Stack {
constructor(scope: cdk.App, id: string) {
super(scope, id);
const workflow = new durable.DurableFunction(this, 'MyWorkflow', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('functions/workflow'),
timeout: cdk.Duration.minutes(15),
maxDuration: cdk.Duration.days(365)
});
}
}
Best Practices I’ve Learned the Hard Way
- Make your activities idempotent. AWS may retry activities after a failure; design them to handle duplicate calls gracefully.
- Don’t store large data in workflow state. The state limit is 256 KB – use S3 for big payloads and pass references instead.
- Use correlation IDs. When external systems signal your workflow, give them a meaningful execution ID (e.g.,
order-{orderId}) rather than a random UUID. - Set realistic timeouts. A workflow can run for a year, but individual activities should have much shorter timeouts (seconds to minutes).
- Monitor with CloudWatch. Set alarms for stuck workflows, failed activities, and unexpected wait times.

The Bottom Line
Lambda Durable Functions represent a significant evolution in serverless orchestration. They give you:
- Simplicity – write workflows as code.
- Cost savings – no charges while waiting.
- Power – run workflows for up to a year.
If you’re building new long‑running workflows—especially those with human‑in‑the‑loop steps or extended wait times—start with Durable Functions. You’ll write less code, pay less money, and sleep better knowing your workflows run on battle‑tested AWS infrastructure.
For existing Step Functions… migrate if your workflows spend most of the time waiting. For fast‑moving workflows with lots of branching logic and AWS service integrations, Step Functions might still be your best bet.
The serverless world just got a lot more interesting. Time to build something that runs for a year. 🚀
What workflows are you running that could benefit from Durable Functions? Drop a comment below and let’s discuss!