Lambda Durable Functions: Building Workflows That Run for a Year

Published: (January 13, 2026 at 01:50 AM EST)
7 min read
Source: Dev.to

Source: Dev.to

The Problem We’ve All Been Ignoring

Think about the last time you built a multi‑step workflow:

  • An order‑processing system that waits for payment confirmation.
  • A content‑moderation pipeline with human‑review steps.
  • A data pipeline that processes files uploaded by users throughout the day.

You probably reached for Step Functions, right? I did too—until I saw the bill.

Step Functions charge per state transition.
$25 per million transitions sounds cheap, but a six‑state approval workflow costs you every single time it runs, even while it’s just waiting for someone to click “Approve” in an email.

💡 The Real Cost of Waiting

WorkflowTransitions per runRuns / monthCost / month
Approval (8 transitions)810 000$2.00

You’re paying for states that do nothing except wait. Lambda Durable Functions? $0.00 for the waiting time.

What Are Lambda Durable Functions, Anyway?

Lambda Durable Functions let you write long‑running workflows as regular code—no JSON state machines required. Write normal TypeScript or Python, and AWS handles:

  • Orchestration
  • State persistence
  • Resumption after pauses

The magic is in the await statement. When your function awaits a durable task, AWS:

  1. Checkpoints the function’s state.
  2. Shuts it down.
  3. Restores it when the task completes—whether that’s 5 seconds later or 5 months later.

You don’t pay for the wait.

How Lambda Durable Functions Work

flowchart TD
    A[Function Starts] --> B[Execute Code]
    B --> C{Await Durable Task?}
    C -- Yes --> D[Checkpoint State]
    D --> E[Suspend Function]
    E --> F[Wait for Event/Timer]
    F --> G[Restore State]
    G --> H[Resume Execution]
    C -- No --> I[Continue]
    I --> H

(If you prefer a simple text diagram, see below.)

Function Starts → Execute Code → Await Durable Task?
    ↓                                    ↓
Continue                         Checkpoint State
    ↓                                    ↓
Complete/Next Step              Suspend Function

                                  Wait for Event/Timer

                                   Restore State

                                  Resume Execution

A Real Example: Document‑Approval Workflow

Below is a practical document‑approval system that:

  • Waits for multiple reviewers.
  • Sends reminders.
  • Escalates if nobody responds.

In Step Functions this would be 15+ states with complex choice logic. In Durable Functions it’s just code.

import { DurableOrchestration } from '@aws-lambda/durable-functions';

export const documentApprovalWorkflow = new DurableOrchestration(
  async (context) => {
    const { documentId, reviewers } = context.input;

    // 1️⃣ Send notification to all reviewers
    await context.callActivity('sendReviewNotifications', {
      documentId,
      reviewers,
    });

    // 2️⃣ Wait for approvals with timeout (7 days)
    const approvalTask = context.waitForEvent('approval', 7 * 24 * 60 * 60);
    const reminderTask = context.createTimer(3 * 24 * 60 * 60); // 3 days

    const winner = await Promise.race([approvalTask, reminderTask]);

    if (winner === 'reminder') {
      // Send reminder and wait again
      await context.callActivity('sendReminderEmails', { reviewers });
      const secondApproval = await context.waitForEvent('approval', 4 * 24 * 60 * 60);

      if (!secondApproval) {
        // Escalate to manager
        await context.callActivity('escalateToManager', { documentId });
        await context.waitForEvent('managerApproval', 2 * 24 * 60 * 60);
      }
    }

    // 3️⃣ Process approval
    const result = await context.callActivity('processApproval', {
      documentId,
      approvedAt: new Date().toISOString(),
    });

    return result;
  }
);

// External system triggers approval
export const submitApproval = async (workflowId: string, decision: string) => {
  await durableClient.raiseEvent(workflowId, 'approval', { decision });
};

Key takeaway: The code reads like a script you’d explain to a colleague—no JSON, no $.decision == 'approved' conditions, just plain programming logic.

Multi‑Step Applications: The Sweet Spot

Durable Functions shine when you have multiple discrete steps, each potentially taking a different amount of time. Below are patterns that work incredibly well.

1️⃣ The Data‑Pipeline Pattern

You receive a file upload, process it through several transformations, wait for quality checks, then publish results. Each step may take seconds or hours depending on file size.

Data Pipeline with Durable Functions

2️⃣ The Human‑in‑the‑Loop Pattern

This is where Durable Functions absolutely crush Step Functions. Any time you need to wait for a human decision—approvals, content moderation, manual data entry—Durable Functions let you:

  • Pause execution without incurring cost.
  • Send reminders or escalations automatically.
  • Resume exactly where you left off once the human acts.

3️⃣ The Scheduled Batch Pattern

Process data in chunks throughout the day, aggregating results, and generating reports. Traditional cron jobs don’t maintain state between runs. Durable Functions do.

export const dailyReportWorkflow = new DurableOrchestration(
  async (context) => {
    const results = [];

    // Process batches every 6 hours
    for (let i = 0; i < 4; i++) {
      const batchResult = await context.callActivity('processBatch', {
        batchNumber: i,
        timestamp: new Date()
      });

      results.push(batchResult);

      // Wait 6 hours before next batch
      if (i < 3) {
        await context.createTimer(6 * 60 * 60);
      }
    }

    // Generate final report with all batches
    return await context.callActivity('generateReport', { results });
  }
);

Lambda Durable Functions vs. Step Functions: The Honest Comparison

FactorLambda Durable FunctionsStep Functions (Standard)
Max Duration365 days365 days
Waiting Cost$0 (state is persisted, function suspended)Free after first 4,000 transitions/month
Execution CostLambda pricing ($0.20 per 1 M requests)$25 per 1 M state transitions
State MachineCode‑based (TypeScript/Python)JSON ASL (Amazon States Language)
VersioningBuilt into code deploymentManual version management
TestingStandard unit tests, local debuggingRequires Step Functions Local or AWS
Visual EditorNone (code only)Workflow Studio (drag‑and‑drop)
Error HandlingTry‑catch blocksRetry policies in JSON

Cost Breakdown Example

Scenario: Approval workflow with 8 steps, waiting an average of 48 h for human response, processing 50 000 documents per month.

Step Functions Cost

  • 50 000 workflows × 8 state transitions = 400 000 transitions
  • (400 000 – 4 000 free tier) × $0.000025 = $9.90 / month

Durable Functions Cost

  • 50 000 workflows × 3 Lambda invocations (start, resume, complete) = 150 000 requests
  • 150 000 × $0.0000002 = $0.03 / month

Savings: 99.7 % for workflows with long wait times.

When NOT to Use Durable Functions

Durable Functions are great, but there are cases where Step Functions still win:

  • You need a visual workflow editor – non‑technical stakeholders appreciate Step Functions’ Workflow Studio.
  • Heavy parallel processing – the Map state in Step Functions handles 10 000+ parallel branches elegantly.
  • AWS service integrations – Step Functions offers 220+ direct integrations; Durable Functions require custom code.
  • Compliance requirements – visual audit trails are easier to produce with Step Functions’ execution history.

Getting Started: Your First Durable Function

Using the AWS SAM template

sam init --runtime nodejs20.x --app-template durable-function
cd my-durable-app
sam build && sam deploy --guided

Or deploy with CDK

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as durable from '@aws-cdk/aws-lambda-durable-functions';

export class DurableStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string) {
    super(scope, id);

    const workflow = new durable.DurableFunction(this, 'MyWorkflow', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('functions/workflow'),
      timeout: cdk.Duration.minutes(15),
      maxDuration: cdk.Duration.days(365)
    });
  }
}

Best Practices I’ve Learned the Hard Way

  1. Make your activities idempotent. AWS may retry activities after a failure; design them to handle duplicate calls gracefully.
  2. Don’t store large data in workflow state. The state limit is 256 KB – use S3 for big payloads and pass references instead.
  3. Use correlation IDs. When external systems signal your workflow, give them a meaningful execution ID (e.g., order-{orderId}) rather than a random UUID.
  4. Set realistic timeouts. A workflow can run for a year, but individual activities should have much shorter timeouts (seconds to minutes).
  5. Monitor with CloudWatch. Set alarms for stuck workflows, failed activities, and unexpected wait times.

Durable Function Architecture Pattern

The Bottom Line

Lambda Durable Functions represent a significant evolution in serverless orchestration. They give you:

  • Simplicity – write workflows as code.
  • Cost savings – no charges while waiting.
  • Power – run workflows for up to a year.

If you’re building new long‑running workflows—especially those with human‑in‑the‑loop steps or extended wait times—start with Durable Functions. You’ll write less code, pay less money, and sleep better knowing your workflows run on battle‑tested AWS infrastructure.

For existing Step Functions… migrate if your workflows spend most of the time waiting. For fast‑moving workflows with lots of branching logic and AWS service integrations, Step Functions might still be your best bet.

The serverless world just got a lot more interesting. Time to build something that runs for a year. 🚀

What workflows are you running that could benefit from Durable Functions? Drop a comment below and let’s discuss!

Back to Blog

Related posts

Read more »

𝗗𝗲𝘀𝗶𝗴𝗻𝗲𝗱 𝗮 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻‑𝗥𝗲𝗮𝗱𝘆 𝗠𝘂𝗹𝘁𝗶‑𝗥𝗲𝗴𝗶𝗼𝗻 𝗔𝗪𝗦 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗘𝗞𝗦 | 𝗖𝗜/𝗖𝗗 | 𝗖𝗮𝗻𝗮𝗿𝘆 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁𝘀 | 𝗗𝗥 𝗙𝗮𝗶𝗹𝗼𝘃𝗲𝗿

!Architecture Diagramhttps://dev-to-uploads.s3.amazonaws.com/uploads/articles/p20jqk5gukphtqbsnftb.gif I designed a production‑grade multi‑region AWS architectu...