Design High-Performing And Elastic Compute Solutions
Source: Dev.to
⚡ Domain 3: Design High‑Performing Architectures
📘 Task Statement 3.2
Designing High‑Performing and Elastic Compute Solutions is about choosing compute that:
- Performs well
- Scales automatically
- Avoids bottlenecks by decoupling components
Approach
- Pick the compute runtime first – EC2 vs containers vs serverless.
- Pick the scaling model – Auto Scaling vs event‑based scaling.
- Tune performance – instance family / memory / concurrency / batching.
Knowledge
1️⃣ AWS Compute Services With Appropriate Use Cases
| Service | Best‑Fit Scenarios |
|---|---|
| Amazon EC2 | • Full control over OS/runtime • Legacy apps, custom networking/agents, special drivers • Predictable long‑running services |
| AWS Lambda | • Event‑driven tasks (APIs, SQS processing, EventBridge, file processing) • Spiky or unpredictable traffic • Minimal operations |
| Amazon ECS / Amazon EKS (containers) | • Microservices and long‑running container workloads • Standardized packaging & predictable scaling • When Lambda constraints don’t fit (timeouts, runtime, dependencies) |
| AWS Fargate (serverless containers) | • Containers without managing EC2 instances • Common “high‑performing + elastic” answer for containerized apps |
| AWS Batch | • Batch jobs, large‑scale job queues, compute‑intensive processing • Automatically provisions compute (often EC2 / Spot) to run jobs |
| Amazon EMR | • Big‑data processing frameworks (Spark, Hadoop) • Distributed ETL / analytics workloads Spark/Hadoop → EMR “run 10,000 batch jobs” → Batch |
2️⃣ Distributed Computing Concepts Supported By Global Infrastructure & Edge Services
- Multi‑AZ architectures reduce impact of an AZ failure and enable scale‑out.
- Edge services reduce latency for end users.
Common Edge & Global Services
- CloudFront – caching and edge delivery.
- Global Accelerator – Anycast routing for TCP/UDP apps.
3️⃣ Queuing & Messaging Concepts (Pub/Sub)
| Service | Role |
|---|---|
| SQS | Decouple producer / consumer; scale workers on queue depth. |
| SNS | Pub/Sub fan‑out. |
| EventBridge | Event routing. |
4️⃣ Scalability Capabilities
- EC2 Auto Scaling – scales EC2 instances in an Auto Scaling Group (ASG).
- AWS Auto Scaling – provides scaling for multiple services (ECS, DynamoDB, Aurora replicas, etc.).
5️⃣ Serverless Technologies & Patterns
| Service | Scaling Mechanism |
|---|---|
| Lambda | Scales by concurrency; can be event‑driven and bursty. |
| Fargate | Scales containers without managing servers; scaling driven by ECS/EKS configuration. |
6️⃣ Orchestration of Containers
Amazon ECS (AWS‑native) concepts:
- Cluster → Service → Tasks
- Task definition is the blueprint.
Amazon EKS (Kubernetes) concepts:
- Cluster → Deployments → Pods
If Kubernetes is not required, ECS is usually simpler.
Skills
A️⃣ Decouple Workloads So Components Can Scale Independently
Decoupling Patterns
- SQS between web tier and workers – buffers spikes, retries, DLQ.
- SNS fan‑out to multiple consumers.
- EventBridge for event‑driven integration.
- Step Functions for orchestration when coordination is needed.
Frontend scales with traffic; workers scale with queue depth.
B️⃣ Identify Metrics & Conditions to Perform Scaling Actions
Common Scaling Signals
- CPU / memory (EC2/ECS).
- Request count / target response time (ALB target metrics).
- Queue depth / age of oldest message (SQS‑based worker scaling).
- Lambda concurrency / duration / throttles.
- Custom CloudWatch metrics (business‑driven scaling, e.g., “jobs waiting”).
C️⃣ Select Appropriate Compute Options & Features to Meet Business Requirements
EC2 Instance Types
| Family | Typical Use |
|---|---|
| t, m | General purpose |
| c | Compute‑heavy |
| r, x | Memory‑heavy |
| i | Storage / IOPS‑heavy |
| p, g | GPU / ML / graphics |
Other EC2 Options
- Spot Instances – fault‑tolerant workloads (batch, stateless, flexible).
- Graviton (Arm) – price/performance advantage.
D️⃣ Select the Appropriate Resource Type & Size to Meet Business Requirements
Lambda Memory
- Memory also determines CPU allocation.
- Increase memory when execution time is too slow (often improves performance).
- Watch for throttles and concurrency limits.
Container Memory
- Right‑size task/pod CPU and memory.
- Scale task count rather than over‑sizing a single task (where possible).
Cheat Sheet
| Requirement | Compute |
|---|---|
| Event‑driven, spiky traffic, minimal ops | Lambda |
| Run containers without managing servers | ECS on Fargate |
| Must use Kubernetes | EKS |
| Need OS control / legacy app | EC2 (+ Auto Scaling) |
| Run many batch jobs / job queue | AWS Batch |
| Spark/Hadoop big‑data processing | EMR |
| Scale workers based on backlog | SQS + autoscaled consumers |
| Need global performance improvement | CloudFront / Global Accelerator |
Recap Checklist ✅
- Workload is decoupled so components can scale independently (queues/events).
- Compute choice matches runtime needs (EC2 vs containers vs serverless).
- Scaling strategy is explicit (Auto Scaling, queue‑based, event‑based).
- Scaling metrics are chosen appropriately (CPU, requests, queue depth, concurrency).
- EC2 instances are selected by workload profile (compute/memory/storage/GPU).
- Lambda/container resources are right‑sized (memory, CPU, task count).
AWS Whitepapers and Official Documentation
These are the primary AWS documents behind Task Statement 3.2.
You do not need to memorize them; use them to understand how to design high‑performing and elastic compute solutions.
Key Areas
- Compute services
- Scaling
- Decoupling and messaging
- Edge and global performance
🚀