AWS Lambda Durable Functions: Build Workflows That Last
Source: Dev.to
What Are Durable Functions?
Durable functions are Lambda functions that can pause and resume. When your function waits for a callback or sleeps for an hour, Lambda checkpoints its state and stops execution. When it’s time to continue, Lambda resumes exactly where it left off—with all variables and context intact.
This isn’t a new compute model. It’s regular Lambda with automatic state management. You write normal async/await code. Lambda makes it durable.
A Simple Example
Here’s a workflow that creates an order, waits 5 minutes, then sends a notification:
import { DurableContext, withDurableExecution } from '@aws/durable-execution-sdk-js';
export const handler = withDurableExecution(
async (event: any, context: DurableContext) => {
const order = await context.step('create-order', async () => {
return createOrder(event.items);
});
await context.wait({ seconds: 300 });
await context.step('send-notification', async () => {
return sendEmail(order.customerId, order.id);
});
return { orderId: order.id, status: 'completed' };
}
);
That’s it. No state machines to configure, no databases to manage, no polling loops. The function pauses during the wait, costs nothing while idle, and resumes automatically after 5 minutes.
Key Capabilities
- Long execution times – Workflows can run for up to 1 year. Individual invocations are still limited to 15 minutes, but the workflow continues across multiple invocations.
- Automatic checkpointing – Lambda saves your function’s state at each step. If something fails, the function resumes from the last checkpoint—not from the beginning.
- Built‑in retries – Configure retry strategies with exponential backoff. Lambda handles the retry logic and timing automatically.
- Wait for callbacks – Pause execution until an external event arrives. Perfect for human approvals, webhook responses, or async API results.
- Parallel execution – Run multiple operations concurrently and wait for all to complete. Lambda manages the coordination.
- Nested workflows – Invoke other durable functions and compose complex workflows from simple building blocks.
How It Works: The Replay Model
Durable functions use a replay‑based execution model. When your function resumes, Lambda replays it from the start—but instead of re‑executing operations, it uses checkpointed results.
First invocation – Your function runs, executing each step and checkpointing results.
Wait or callback – Function pauses; Lambda saves state and stops execution.
Resume – Lambda invokes your function again, replaying from the start.
Replay – Operations return checkpointed results instantly instead of re‑executing.
Continue – Function proceeds past the wait with all context intact.
This model ensures your function always sees consistent state, even across issues and restarts. Operations must be deterministic—they execute once and replay with the same result.
Learn more: Understanding the Replay Model
Common Use Cases
- Approval workflows – Wait for human approval before proceeding. The function pauses until someone clicks approve or reject.
- Saga patterns – Coordinate distributed transactions with compensating actions. If a step fails, automatically roll back previous steps.
- Scheduled tasks – Wait for specific times or intervals. Process data at midnight, send reminders after 24 hours, or retry every 5 minutes.
- API orchestration – Call multiple APIs with retries and error handling. Coordinate responses and handle partial issues gracefully.
- Data processing pipelines – Process large datasets in stages with checkpoints. Resume from the last successful stage if something fails.
- Event‑driven workflows – React to external events like webhooks, IoT signals, or user actions. Wait for events and continue processing when they arrive.
Testing Your Workflows
Testing long‑running workflows doesn’t mean waiting hours. The Durable Execution SDK includes a testing library that runs your functions locally in milliseconds:
import { LocalDurableTestRunner } from '@aws/durable-execution-sdk-js-testing';
const runner = new LocalDurableTestRunner({
handlerFunction: handler,
});
const execution = await runner.run();
expect(execution.getStatus()).toBe('SUCCEEDED');
expect(execution.getResult()).toEqual({ orderId: '123', status: 'completed' });
The test runner simulates checkpoints, skips time‑based waits, and lets you inspect every operation. You can test callbacks, retries, and failures without deploying to AWS.
Learn more: Testing Durable Functions
Deploying with AWS SAM
Deploy durable functions using AWS SAM with a few key configurations:
Resources:
OrderProcessorFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src/order-processor
Handler: index.handler
Runtime: nodejs22.x
DurableConfig:
ExecutionTimeout: 900
RetentionPeriodInDays: 7
Metadata:
BuildMethod: esbuild
BuildProperties:
EntryPoints:
- index.ts
The DurableConfig property enables durable execution and sets the workflow timeout. SAM automatically handles IAM permissions for checkpointing and state management.
Learn more: Deploying Durable Functions with SAM
When to Use Durable Functions
- Your workflow spans multiple steps with waits or callbacks.
- You need automatic retries with exponential backoff.
- You want to coordinate multiple async operations.
- Your process requires human approval or external events.
- You need to handle long‑running tasks without managing state.
- You prefer writing workflows as code rather than configuration.
Getting Started
- Install the SDK
npm install @aws/durable-execution-sdk-js - Write your function – Wrap your handler with
withDurableExecution(). - Use durable operations –
context.step(),context.wait(),context.waitForCallback(). - Test locally – Use
LocalDurableTestRunnerfor fast iteration. - Deploy with SAM – Add
DurableConfigto your template. - Monitor execution – Use Amazon CloudWatch and AWS X‑Ray for observability.
Learn More
- Understanding the Replay Model – Deep dive into how durable functions work under the hood.
- Testing Durable Functions – Comprehensive guide to testing.
- Deploying Durable Functions with SAM – Template configuration, permissions, and deployment best practices.