cloud architecture
Cloud Architecture Patterns
Essential patterns for designing scalable, resilient, and cost-effective cloud-native applications.
Cloud Architecture Patterns
TL;DR
Cloud architecture patterns solve recurring challenges in distributed, cloud-native systems: scalability, resilience, data management, and messaging. These patterns leverage cloud capabilities (elasticity, managed services, global infrastructure) while addressing cloud-specific challenges (network latency, eventual consistency, cost optimization).
Key Takeaways
- Design for failure: Assume everything fails; design to recover automatically
- Prefer managed services: Reduce operational burden, leverage provider expertise
- Embrace elasticity: Scale out rather than up; pay for what you use
- Decouple components: Use async messaging to reduce dependencies
- Optimize for cost: Cloud flexibility enables—and requires—cost awareness
Why This Matters
Cloud computing fundamentally changes how we architect systems. Infinite (apparent) resources, pay-per-use pricing, and managed services enable patterns impossible in traditional infrastructure. But cloud also introduces new challenges: network partitions, eventual consistency, and the complexity of distributed systems. Understanding cloud patterns helps you leverage benefits while avoiding pitfalls.
Cloud-Native
Cloud-native doesn't mean "runs in the cloud." It means designed to exploit cloud characteristics: elasticity, automation, managed services, and geographic distribution.
Pattern Categories
CLOUD ARCHITECTURE PATTERNS
├── COMPUTE PATTERNS
│ ├── Serverless
│ ├── Containers
│ └── Auto-scaling
│
├── DATA PATTERNS
│ ├── Event Sourcing
│ ├── CQRS
│ └── Polyglot Persistence
│
├── MESSAGING PATTERNS
│ ├── Queue-Based Load Leveling
│ ├── Publisher-Subscriber
│ └── Event-Driven
│
├── RESILIENCE PATTERNS
│ ├── Retry
│ ├── Circuit Breaker
│ └── Bulkhead
│
└── DEPLOYMENT PATTERNS
├── Blue-Green
├── Canary
└── Feature Flags
Compute Patterns
What It Is
Execute code without managing servers. Cloud provider handles infrastructure, scaling, and availability.
When to Use
| Good Fit | Poor Fit |
|---|---|
| Event-driven workloads | Long-running processes |
| Variable/unpredictable traffic | Consistent high throughput |
| Rapid prototyping | Complex stateful workflows |
| Scheduled tasks | Low-latency requirements (cold start) |
Architecture Pattern
SERVERLESS EVENT-DRIVEN
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ API GW │───▶│ Lambda │───▶│ DynamoDB│ │ S3 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │
└─────────────────────────────┘
Trigger on upload
EVENT SOURCES
├── HTTP (API Gateway)
├── Queue (SQS)
├── Stream (Kinesis, DynamoDB Streams)
├── Schedule (CloudWatch Events)
├── Storage (S3 events)
└── Database (change streams)
Cost Considerations
SERVERLESS COST MODEL
CHARGED FOR:
├── Number of invocations
├── Duration (GB-seconds)
└── Memory allocated
NOT CHARGED FOR:
├── Idle time
├── Infrastructure management
└── Scaling infrastructure
OPTIMIZE BY:
├── Right-size memory allocation
├── Minimize cold starts (provisioned concurrency)
├── Optimize code execution time
└── Use efficient runtimes
Cold Starts
Functions not recently invoked require initialization (cold start), adding latency. For latency-sensitive applications, consider provisioned concurrency or keep-warm strategies.
Data Patterns
What It Is
Use different database types for different data needs within the same application.
Database Selection Guide
DATA STORE SELECTION
RELATIONAL (PostgreSQL, MySQL)
├── Structured data with relationships
├── ACID transactions required
├── Complex queries with joins
└── Strong consistency needed
DOCUMENT (MongoDB, DynamoDB)
├── Semi-structured data
├── Flexible schema
├── Hierarchical data
└── Scale-out requirements
KEY-VALUE (Redis, DynamoDB)
├── Simple lookup by key
├── Session storage
├── Caching
└── High throughput, low latency
GRAPH (Neo4j, Neptune)
├── Highly connected data
├── Relationship traversal
├── Recommendations, fraud detection
└── Social networks
TIME-SERIES (InfluxDB, Timescale)
├── Timestamped data
├── Metrics and monitoring
├── IoT sensor data
└── Time-based aggregations
SEARCH (Elasticsearch, OpenSearch)
├── Full-text search
├── Log analytics
├── Faceted search
└── Real-time indexing
Example Architecture
POLYGLOT PERSISTENCE EXAMPLE (E-commerce)
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
└─────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│PostgreSQL│ │ MongoDB │ │ Redis │ │Elastic- │
│ │ │ │ │ │ │search │
│ Orders │ │ Product │ │ Sessions│ │ Search │
│ Users │ │ Catalog │ │ Cart │ │ Index │
│ Payments│ │ │ │ Cache │ │ │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Each store optimized for its access patterns
Messaging Patterns
What It Is
Use a queue between producer and consumer to handle variable loads without overwhelming the consumer.
Pattern Structure
WITHOUT QUEUE (Direct Coupling)
Producer ──────────────────▶ Consumer
High load = overwhelmed
WITH QUEUE (Load Leveling)
Producer ──▶ [ Queue ] ──▶ Consumer
Buffer absorbs spikes
Consumer processes at own pace
BEHAVIOR DURING SPIKE
Incoming │ ████████████████████
Load │ ████████████
│ █████████
│ ██████████████
└─────────────────────▶ Time
Queue │ ████
Depth │ █████████
│ ██████████████
│ █████████████████
└─────────────────────▶ Time
(Queue absorbs, drains gradually)
Implementation
AWS SQS EXAMPLE
Producer (Lambda/EC2):
sqs.sendMessage({
QueueUrl: 'https://sqs.../my-queue',
MessageBody: JSON.stringify(event),
MessageGroupId: 'orders' // for FIFO
});
Consumer (Lambda trigger or polling):
exports.handler = async (event) => {
for (const record of event.Records) {
const message = JSON.parse(record.body);
await processMessage(message);
}
};
CONFIGURATION
├── Visibility timeout: Time to process before retry
├── Message retention: How long unprocessed messages kept
├── Dead letter queue: Where failed messages go
└── Batch size: Messages processed per invocation
Resilience Patterns
What It Is
When operations fail, retry with increasing delays to avoid overwhelming struggling services.
Implementation
EXPONENTIAL BACKOFF
Attempt 1: Immediate
Attempt 2: Wait 1 second
Attempt 3: Wait 2 seconds
Attempt 4: Wait 4 seconds
Attempt 5: Wait 8 seconds
→ Give up, return error
WITH JITTER (Recommended)
delay = base * 2^attempt + random(0, base)
Jitter prevents thundering herd when many clients
retry simultaneously after an outage.
Configuration
// AWS SDK v3 default retry configuration
const client = new S3Client({
maxAttempts: 3,
retryStrategy: new StandardRetryStrategy(async () => 3, {
retryDecider: (error) => {
// Retry on throttling, transient errors
return error.$retryable?.throttling ||
error.$fault === 'server';
},
delayDecider: (delayBase, attempts) => {
return delayBase * Math.pow(2, attempts - 1);
}
})
});Deployment Patterns
What It Is
Maintain two identical production environments. Deploy to inactive, then switch traffic.
Process
BLUE-GREEN DEPLOYMENT
BEFORE DEPLOYMENT
┌─────────────────────┐
Traffic ──────▶│ Blue (v1.0) │ ◀── Active
└─────────────────────┘
┌─────────────────────┐
│ Green (idle) │ ◀── Inactive
└─────────────────────┘
DEPLOY TO GREEN
┌─────────────────────┐
Traffic ──────▶│ Blue (v1.0) │ ◀── Active
└─────────────────────┘
┌─────────────────────┐
Deploy ────▶│ Green (v1.1) │ ◀── Deploy here
└─────────────────────┘
SWITCH TRAFFIC
┌─────────────────────┐
│ Blue (v1.0) │ ◀── Standby
└─────────────────────┘
┌─────────────────────┐
Traffic ──────▶│ Green (v1.1) │ ◀── Active
└─────────────────────┘
ROLLBACK = Switch traffic back to Blue
Pros and Cons
| Pros | Cons |
|---|---|
| Instant rollback | Double infrastructure cost |
| Zero downtime | Database migrations complex |
| Full testing before switch | Session management needed |
| Clean cutover | Configuration must be in sync |
Quick Reference Card
┌─────────────────────────────────────────────────────────────┐
│ CLOUD ARCHITECTURE CHEAT SHEET │
├─────────────────────────────────────────────────────────────┤
│ │
│ CLOUD-NATIVE PRINCIPLES │
│ ───────────────────────────────────────────────────────── │
│ • Design for failure (everything fails eventually) │
│ • Prefer managed services (reduce operational burden) │
│ • Embrace elasticity (scale out, not up) │
│ • Decouple with messaging (async over sync) │
│ • Automate everything (infrastructure as code) │
│ │
│ COMPUTE SELECTION │
│ ───────────────────────────────────────────────────────── │
│ Serverless → Event-driven, variable load, < 15 min │
│ Containers → Full control, long-running, predictable │
│ VMs → Legacy apps, specific OS requirements │
│ │
│ DATA STORE SELECTION │
│ ───────────────────────────────────────────────────────── │
│ Relational → ACID, complex queries, relationships │
│ Document → Flexible schema, hierarchical data │
│ Key-Value → Simple lookup, caching, sessions │
│ Graph → Highly connected data, traversals │
│ │
│ RESILIENCE PATTERNS │
│ ───────────────────────────────────────────────────────── │
│ Retry → Transient failures, exponential backoff │
│ Circuit → Fail fast, prevent cascade │
│ Bulkhead → Isolate failures, limit blast radius │
│ Timeout → Don't wait forever, fail gracefully │
│ │
│ DEPLOYMENT PATTERNS │
│ ───────────────────────────────────────────────────────── │
│ Blue-Green → Instant rollback, double infrastructure │
│ Canary → Gradual rollout, metric-based progression │
│ Feature Flag→ Decouple deploy from release │
│ │
└─────────────────────────────────────────────────────────────┘
Related Topics
- AWS Well-Architected - AWS best practices
- Azure Well-Architected - Azure best practices
- Twelve-Factor App - Cloud-native principles
- Microservices Patterns - Service patterns
Sources
- AWS Architecture Center
- Azure Architecture Center
- Cloud Design Patterns - Microsoft
- Building Microservices - Sam Newman