Redis vs DynamoDB vs DAX: I Benchmarked AWS Caching Performance (The Results Were Unexpected)
Source: Dev.to
Benchmarking In‑Memory Caching for DynamoDB Reads
In many backend systems, user data is fetched on almost every request. A common assumption is that adding an in‑memory cache will improve read performance for any system.
To validate this assumption, I benchmarked three approaches for accessing user data from DynamoDB inside a serverless architecture:
| Approach | Description |
|---|---|
| Baseline | AWS Lambda + DynamoDB (no cache) |
| Cache‑aside | Lambda → Redis → DynamoDB |
| DAX | Lambda → DynamoDB Accelerator (DAX) → DynamoDB |
The goal was to see whether a CTO of a startup should invest in caching services. I expected the cached approaches to outperform the baseline easily, but even at 200 requests / second (12 000 requests / minute) that assumption didn’t hold.
Experimental Setup
The architecture for all three tests is as identical as possible – the cheapest available options in the eu‑central‑1 AWS region.
1. Baseline (DDB + Lambda)
| Component | Configuration |
|---|---|
| Lambda | Python 3.12, 256 MiB memory, 10 s timeout |
| DynamoDB | Pay‑per‑request billing mode |
| Access pattern | Direct reads from DynamoDB (no cache) |
| Latency | Highest (no cache) |
2. DAX (DynamoDB Accelerator)
| Component | Configuration |
|---|---|
| Lambda | Python 3.12, 256 MiB memory, 10 s timeout |
| DAX cluster | dax.t3.small (single node, replication factor = 1) deployed in VPC isolated subnets |
| Cache | Managed by DAX automatically (default item TTL ≈ 5 min, query cache TTL ≈ 5 min) |
| Access pattern | Lambda → DAX → DynamoDB |
| Client | amazondax Python client (installed via pip) |
3. Redis (AWS ElastiCache)
| Component | Configuration |
|---|---|
| Lambda | Python 3.12, 256 MiB memory, 10 s timeout |
| Redis cluster | cache.t4g.micro (single node, no automatic failover) deployed in VPC isolated subnets |
| Cache TTL | 30 seconds (configurable via REDIS_TTL_SECONDS) |
| Access pattern | 1️⃣ Lambda checks Redis first 2️⃣ On miss → read from DynamoDB, store in Redis with 30 s TTL 3️⃣ On hit → return cached data |
| Client | Standard redis-py client with 1 s connection timeout |
Test Methodology
- Load levels – 50 reads / s (3 000 reads / min) and 200 reads / s (12 000 reads / min).
- Payload – Same size for all runs, using a hot/mixed key distribution.
- Metric – 95th‑percentile latency (p95).
- Tool – Open‑source load‑testing library k6 (JavaScript scripts).
All three approaches fetched the same DynamoDB item, e.g.:
{
"pk": "ITEM#123",
"sk": "META",
"itemId": "123",
"title": "Example Item Title",
"body": "This is the body content of the item",
"updatedAt": 1736467200,
"etag": "a7f8d9e1c2b3a4f5e6d7c8b9a0f1e2d3c4b5a6f7e8d9c0b1a2f3e4d5c6b7a8f9"
}
Results – 50 RPS (Establishing the Baseline)
| Access Pattern | p95 Latency (ms) | Avg Latency (ms) | Dropped Iterations | Notes |
|---|---|---|---|---|
| Lambda + DynamoDB (Baseline) | ~63 | ~48 | 0 | Fast, stable, no bottlenecks |
| Redis (warm‑up run) | ~68 | ~66 | 22 | Cache misses + write‑back cost |
| Redis (steady state) | ~63 | ~48 | 0 | Matches baseline, no latency win |
| DAX (single small node) | ~1040 | ~957 | 19 | Cache saturation, unusable |
Interpretation
- Baseline – At 50 RPS the Lambda + DynamoDB combo delivered a p95 latency of ~63 ms with zero drops. DynamoDB on‑demand was not under pressure.
- Redis – The first (warm‑up) run suffered cache misses, inflating latency. Once the cache warmed, latency matched the baseline but did not improve it.
- DAX – The undersized
dax.t3.smallnode became CPU‑bound, causing request queuing and p95 latencies > 1 s. This demonstrates that a mis‑sized cache can degrade performance.
Conclusion (50 RPS)
The baseline performed excellently and required no additional caching. Adding Redis reduced DynamoDB load but did not yield a measurable latency benefit. DAX, when undersized, was detrimental.
Results – 200 RPS (The Expected Crossover That Didn’t Happen)
| Access Pattern | p95 Latency (ms) | Avg Latency (ms) | Dropped Iterations | Notes |
|---|---|---|---|---|
| Lambda + DynamoDB (Baseline) | ~63 | ~48 | 13 | Stable, scales linearly |
| Redis (warm run) | ~64 | ~52 | 40 | Cache population under load |
| Redis (steady state) | ~70 | ~58 | 79 | Slightly worse than baseline |
| DAX (single small node) | ~1050 | ~968 | 5 399 | Cluster saturation |
Interpretation
- Baseline – Even at 200 RPS the Lambda + DynamoDB setup maintained ~63 ms p95 latency, confirming its scalability.
- Redis – Again showed two phases. The warm‑up run incurred miss latency; the steady‑state run was marginally slower than the baseline, likely due to added network hops and occasional cache misses under load.
- DAX – The single small node was completely saturated, resulting in massive latency and thousands of dropped iterations.
Conclusion (200 RPS)
The baseline remains the best‑performing, simplest solution. Redis does not provide a latency advantage at these loads, and an undersized DAX cluster harms performance dramatically.
Overall Takeaways
- Baseline Lambda + DynamoDB is often sufficient – On‑demand billing and automatic scaling keep latency low even at 200 RPS.
- Cache‑aside Redis can reduce DynamoDB read pressure but adds network hops; without a very high read‑to‑write ratio or stricter latency SLAs, it may not improve response times.
- DAX is not a plug‑and‑play speed boost – Proper capacity planning is essential; an undersized node can become a bottleneck.
- Caching makes sense when
- The read‑to‑write ratio is extremely high (e.g., > 100:1).
- Latency requirements are sub‑10 ms and the baseline cannot meet them.
- The workload exhibits strong temporal locality that justifies the extra operational complexity.
For most startups operating at modest traffic levels (≤ 200 RPS), the simplest architecture—Lambda directly querying DynamoDB—delivers the best cost‑performance balance. Adding a cache should be driven by concrete latency or cost‑reduction goals, not by the assumption that “caching always helps.”
Cache Warm‑up and Redis Stabilization
Once the cache was warm, Redis stabilized but did not outperform the baseline. In steady state:
- p95 latency increased slightly to ~70 ms.
- The number of dropped iterations was higher than with DynamoDB alone.
DAX @ 200 RPS: Saturation Under Load
At 200 RPS the DAX configuration began to collapse:
- Effective throughput dropped well below the target rate.
- p95 latency exceeded one second.
- Thousands of iterations were dropped.
This behavior confirms that DAX is highly sensitive to sizing—smaller instances simply do not provide any benefit.
Conclusion: 200 RPS
Even at 200 RPS, the dominant cost in this system was not database access but network and managed‑service overhead. Adding a cache did not remove that cost—it added to it, since you still need to pay for the on‑demand or serverless Redis instance, depending on your choice.
What These Results Actually Prove
- DynamoDB on‑demand scales extremely well for simple reads.
- Redis reduces pressure on the database, not latency.
- Cache warm‑up matters.
- A misconfigured DAX is worse than no cache.
- Latency optimization and scaling optimization are different problems.
Conclusion & Lessons Learned
The results from both the 50 RPS and 200 RPS benchmarks lead to a clear—and somewhat counter‑intuitive—conclusion: for this workload, Lambda backed by DynamoDB on‑demand was already fast enough that adding a cache did not improve user‑visible latency.
- At both load levels, DynamoDB was not the bottleneck.
- End‑to‑end latency was dominated by network distance and managed‑service overhead, not database access time.
- Introducing Redis added an extra network hop and client‑side overhead without removing the dominant cost in the request path.
Redis still served a purpose, but not the one initially expected. It reduced pressure on DynamoDB and flattened backend load, which can be valuable for cost control and future scaling. What it did not do—at these traffic levels—was make requests faster.
DAX Takeaways
- When undersized, DAX simply doesn’t help; it saturates quickly, causing increased latency and dropped requests.
- It requires careful capacity planning and a solid understanding of the workflow.
Final Thought
The biggest lesson from this experiment is that measurement before optimization is essential. Even though caching can reduce backend load, it doesn’t automatically lower end‑to‑end latency.
If the workflow is too simple, caching benefits may be invisible. However, when you have a time‑costly operation whose results can be cached, caching is likely worth testing.
In short: Don’t cache because it feels right—cache because the data proves you need it.