quick reference
Cloud Patterns Cheat Sheet
Quick reference card for essential cloud architecture patterns across compute, data, messaging, and resilience.
Cloud Patterns Cheat Sheet
Cloud-Native Principles
CLOUD-NATIVE FUNDAMENTALS
─────────────────────────────────────────────
1. Design for failure (everything fails)
2. Prefer managed services (reduce ops)
3. Embrace elasticity (scale out, not up)
4. Decouple with messaging (async over sync)
5. Automate everything (infrastructure as code)
6. Observe everything (metrics, logs, traces)
Compute Pattern Selection
COMPUTE DECISION TREE
─────────────────────────────────────────────
Start
│
┌─────────────┴─────────────┐
│ Need full OS control? │
└─────────────┬─────────────┘
Yes/ \No
/ \
┌───▼───┐ ┌▼──────────────────┐
│ VMs │ │ Container or │
└───────┘ │ Serverless? │
└─────────┬─────────┘
Long/ \Event-driven
running \< 15 min
/ \
┌───▼───┐ ┌──▼────────┐
│ K8s/ │ │ Serverless│
│ ECS │ │ (Lambda) │
└───────┘ └───────────┘
COMPARISON
─────────────────────────────────────────────
│Serverless│Containers│ VMs
──────────────┼──────────┼──────────┼────────
Startup time │ Cold: sec│ Seconds │ Minutes
Scaling │ Auto │ Configured│Manual
Control │ Limited │ Medium │ Full
Ops burden │ Minimal │ Medium │ High
Cost model │ Per use │ Reserved │ Reserved
Max runtime │ 15 min │ Unlimited│Unlimited
Data Store Selection
DATABASE SELECTION GUIDE
─────────────────────────────────────────────
RELATIONAL (PostgreSQL, MySQL, Aurora)
├── ACID transactions required
├── Complex queries with JOINs
├── Structured data with relationships
└── Strong consistency needed
DOCUMENT (MongoDB, DynamoDB, CosmosDB)
├── Flexible/evolving schema
├── Hierarchical data
├── Key-value access patterns
└── Horizontal scale required
KEY-VALUE (Redis, ElastiCache)
├── Simple lookup by key
├── Session storage
├── Caching layer
└── Sub-millisecond latency
GRAPH (Neo4j, Neptune)
├── Highly connected data
├── Relationship traversal
├── Recommendations
└── Fraud detection
TIME-SERIES (InfluxDB, Timestream)
├── Timestamped data
├── IoT sensor data
├── Metrics/monitoring
└── Time-based aggregations
SEARCH (Elasticsearch, OpenSearch)
├── Full-text search
├── Log analytics
├── Faceted search
└── Real-time indexing
Messaging Patterns
QUEUE-BASED LOAD LEVELING
─────────────────────────────────────────────
Producer ──▶ [ Queue ] ──▶ Consumer
• Absorbs traffic spikes
• Consumer processes at own pace
• Prevents overwhelming downstream
Config: Visibility timeout, DLQ, Retention
PUBLISH-SUBSCRIBE
─────────────────────────────────────────────
Publisher ──▶ [Topic] ──▶ Subscriber A
──▶ Subscriber B
──▶ Subscriber C
• Decouple producers from consumers
• Fan-out to multiple destinations
• Filter by message attributes
EVENT-DRIVEN ARCHITECTURE
─────────────────────────────────────────────
┌─────────────┐
Service ──│ Event Bus │──▶ Service B
A └──────┬──────┘──▶ Service C
│ ──▶ Service D
│
• Loose coupling
• Eventually consistent
• Complex to debug
Resilience Patterns
RETRY WITH EXPONENTIAL BACKOFF
─────────────────────────────────────────────
Attempt 1: Immediate
Attempt 2: Wait 1s
Attempt 3: Wait 2s
Attempt 4: Wait 4s
Attempt 5: Wait 8s → Give up
Formula: delay = base × 2^attempt + jitter
Jitter prevents thundering herd
CIRCUIT BREAKER
─────────────────────────────────────────────
┌─────────────────────────────────┐
│ │
▼ │
┌───────┐ failures ┌───────┐ │
│CLOSED │ ─────────▶ │ OPEN │ │
└───────┘ └───┬───┘ │
▲ │ │
│ timeout ▼ │
│ ┌──────────┐ │
│ success │HALF-OPEN │──────┘
└──────────────└──────────┘ failure
CLOSED: Requests pass through
OPEN: Fail immediately (no backend call)
HALF-OPEN: Test if service recovered
BULKHEAD
─────────────────────────────────────────────
WITHOUT WITH
┌──────────────┐ ┌──────┬──────┐
│ Shared Pool │ │Pool A│Pool B│
│ ████████████ │ │ ██ │ ██ │
└──────────────┘ └──────┴──────┘
One failure Failures
affects all isolated
TIMEOUT
─────────────────────────────────────────────
• Always set timeouts on external calls
• Fail fast rather than wait forever
• Total timeout < user patience threshold
Deployment Patterns
BLUE-GREEN DEPLOYMENT
─────────────────────────────────────────────
Traffic ──▶ [Blue v1.0] ← Active
[Green v1.1] ← Deploy here
Then switch:
[Blue v1.0] ← Standby
Traffic ──▶ [Green v1.1] ← Active
Pros: Instant rollback, zero downtime
Cons: Double infrastructure cost
CANARY DEPLOYMENT
─────────────────────────────────────────────
Phase 1: [v1.0] ← 95% │ [v1.1] ← 5%
Phase 2: [v1.0] ← 75% │ [v1.1] ← 25%
Phase 3: [v1.0] ← 50% │ [v1.1] ← 50%
Complete: [v1.1] ← 100%
Pros: Gradual rollout, metric-based
Cons: Longer deployment, complexity
FEATURE FLAGS
─────────────────────────────────────────────
if (featureFlags.isEnabled('new-checkout')) {
showNewCheckout();
} else {
showOldCheckout();
}
• Decouple deploy from release
• Gradual rollout to user segments
• Quick kill switch for problems
Caching Patterns
CACHE-ASIDE (Lazy Loading)
─────────────────────────────────────────────
1. Check cache
2. If miss → load from DB
3. Store in cache
4. Return result
Best for: Read-heavy, tolerance for stale
WRITE-THROUGH
─────────────────────────────────────────────
1. Write to cache
2. Cache writes to DB (sync)
3. Return success
Best for: Consistency critical
WRITE-BEHIND
─────────────────────────────────────────────
1. Write to cache
2. Return success immediately
3. Cache writes to DB (async)
Best for: Write-heavy, can tolerate loss
CACHE HIERARCHY
─────────────────────────────────────────────
[Browser Cache] ← Fastest, limited
│
[CDN / Edge Cache] ← Static content
│
[App Cache / Redis] ← Dynamic data
│
[Database] ← Source of truth
Auto-Scaling Patterns
SCALING APPROACHES
─────────────────────────────────────────────
REACTIVE (Metrics-based)
├── CPU > 70% for 5 min → +2 instances
├── CPU < 30% for 10 min → -1 instance
└── Queue depth > 1000 → +3 instances
PREDICTIVE (Forecast-based)
├── Historical: Traffic spikes 9 AM
└── Pre-scale at 8:45 AM
SCHEDULED (Time-based)
├── Business hours: 10 instances
└── Off-hours: 2 instances
METRICS TO SCALE ON
─────────────────────────────────────────────
• CPU utilization
• Memory utilization
• Request count
• Queue depth
• Custom metrics (business KPIs)
Multi-Region Patterns
ACTIVE-PASSIVE
─────────────────────────────────────────────
Traffic ──▶ [Region A] ← Primary
[Region B] ← Standby (warm)
RTO: Minutes to hours
RPO: Depends on replication lag
Cost: Lower (standby underutilized)
ACTIVE-ACTIVE
─────────────────────────────────────────────
┌──▶ [Region A]
Traffic─┤
└──▶ [Region B]
RTO: Near-zero
RPO: Complex (data sync)
Cost: Higher (both regions fully utilized)
DATA REPLICATION
─────────────────────────────────────────────
SYNCHRONOUS: Strong consistency, higher latency
ASYNCHRONOUS: Eventual consistency, lower latency
Service Mesh Pattern
SERVICE MESH ARCHITECTURE
─────────────────────────────────────────────
┌─────────────────────────────────────────┐
│ Control Plane │
│ (Config, Policy, Certificates) │
└─────────────────────────────────────────┘
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Sidecar │ │ Sidecar │ │ Sidecar │
│ Proxy │←──▶│ Proxy │←──▶│ Proxy │
│┌───────┐│ │┌───────┐│ │┌───────┐│
││Service││ ││Service││ ││Service││
│└───────┘│ │└───────┘│ │└───────┘│
└─────────┘ └─────────┘ └─────────┘
PROVIDES:
├── Traffic management
├── Service discovery
├── Load balancing
├── mTLS encryption
├── Observability
└── Retry/timeout policies
Cloud Services Quick Reference
AWS AZURE GCP
─────────────────────────────────────────────
COMPUTE
Lambda Functions Cloud Functions
ECS/EKS AKS GKE
EC2 VMs Compute Engine
DATABASE
RDS SQL Database Cloud SQL
DynamoDB CosmosDB Firestore
ElastiCache Cache for Redis Memorystore
MESSAGING
SQS Queue Storage Pub/Sub
SNS Event Grid Pub/Sub
EventBridge Event Grid Eventarc
STORAGE
S3 Blob Storage Cloud Storage
EBS Managed Disks Persistent Disk
NETWORKING
VPC VNet VPC
CloudFront CDN Cloud CDN
Route 53 DNS Cloud DNS
Pattern Selection Summary
PATTERN SELECTION GUIDE
─────────────────────────────────────────────
SCENARIO PATTERN
─────────────────────────────────────────────
Variable load, spiky Queue-based load leveling
Event-driven, fan-out Pub/Sub
Failing dependency Circuit breaker
Transient failures Retry with backoff
Resource isolation Bulkhead
Zero-downtime deploy Blue-green
Gradual rollout Canary
Read-heavy workload Cache-aside
High availability Multi-region active-active
Cost optimization Spot instances, auto-scaling