Skip to main content

Command Palette

Search for a command to run...

cloud architecture

Cloud Architecture Patterns

Essential patterns for designing scalable, resilient, and cost-effective cloud-native applications.

Cloud Architecture Patterns

TL;DR

Cloud architecture patterns solve recurring challenges in distributed, cloud-native systems: scalability, resilience, data management, and messaging. These patterns leverage cloud capabilities (elasticity, managed services, global infrastructure) while addressing cloud-specific challenges (network latency, eventual consistency, cost optimization).

Key Takeaways

  • Design for failure: Assume everything fails; design to recover automatically
  • Prefer managed services: Reduce operational burden, leverage provider expertise
  • Embrace elasticity: Scale out rather than up; pay for what you use
  • Decouple components: Use async messaging to reduce dependencies
  • Optimize for cost: Cloud flexibility enables—and requires—cost awareness

Why This Matters

Cloud computing fundamentally changes how we architect systems. Infinite (apparent) resources, pay-per-use pricing, and managed services enable patterns impossible in traditional infrastructure. But cloud also introduces new challenges: network partitions, eventual consistency, and the complexity of distributed systems. Understanding cloud patterns helps you leverage benefits while avoiding pitfalls.

Cloud-Native

Cloud-native doesn't mean "runs in the cloud." It means designed to exploit cloud characteristics: elasticity, automation, managed services, and geographic distribution.


Pattern Categories

CLOUD ARCHITECTURE PATTERNS
├── COMPUTE PATTERNS
│   ├── Serverless
│   ├── Containers
│   └── Auto-scaling
│
├── DATA PATTERNS
│   ├── Event Sourcing
│   ├── CQRS
│   └── Polyglot Persistence
│
├── MESSAGING PATTERNS
│   ├── Queue-Based Load Leveling
│   ├── Publisher-Subscriber
│   └── Event-Driven
│
├── RESILIENCE PATTERNS
│   ├── Retry
│   ├── Circuit Breaker
│   └── Bulkhead
│
└── DEPLOYMENT PATTERNS
    ├── Blue-Green
    ├── Canary
    └── Feature Flags

Compute Patterns

What It Is

Execute code without managing servers. Cloud provider handles infrastructure, scaling, and availability.

When to Use

Good FitPoor Fit
Event-driven workloadsLong-running processes
Variable/unpredictable trafficConsistent high throughput
Rapid prototypingComplex stateful workflows
Scheduled tasksLow-latency requirements (cold start)

Architecture Pattern

SERVERLESS EVENT-DRIVEN

┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
│ API GW  │───▶│ Lambda  │───▶│ DynamoDB│    │   S3    │
└─────────┘    └─────────┘    └─────────┘    └─────────┘
                    │                             │
                    └─────────────────────────────┘
                           Trigger on upload

EVENT SOURCES
├── HTTP (API Gateway)
├── Queue (SQS)
├── Stream (Kinesis, DynamoDB Streams)
├── Schedule (CloudWatch Events)
├── Storage (S3 events)
└── Database (change streams)

Cost Considerations

SERVERLESS COST MODEL

CHARGED FOR:
├── Number of invocations
├── Duration (GB-seconds)
└── Memory allocated

NOT CHARGED FOR:
├── Idle time
├── Infrastructure management
└── Scaling infrastructure

OPTIMIZE BY:
├── Right-size memory allocation
├── Minimize cold starts (provisioned concurrency)
├── Optimize code execution time
└── Use efficient runtimes

Cold Starts

Functions not recently invoked require initialization (cold start), adding latency. For latency-sensitive applications, consider provisioned concurrency or keep-warm strategies.


Data Patterns

What It Is

Use different database types for different data needs within the same application.

Database Selection Guide

DATA STORE SELECTION

RELATIONAL (PostgreSQL, MySQL)
├── Structured data with relationships
├── ACID transactions required
├── Complex queries with joins
└── Strong consistency needed

DOCUMENT (MongoDB, DynamoDB)
├── Semi-structured data
├── Flexible schema
├── Hierarchical data
└── Scale-out requirements

KEY-VALUE (Redis, DynamoDB)
├── Simple lookup by key
├── Session storage
├── Caching
└── High throughput, low latency

GRAPH (Neo4j, Neptune)
├── Highly connected data
├── Relationship traversal
├── Recommendations, fraud detection
└── Social networks

TIME-SERIES (InfluxDB, Timescale)
├── Timestamped data
├── Metrics and monitoring
├── IoT sensor data
└── Time-based aggregations

SEARCH (Elasticsearch, OpenSearch)
├── Full-text search
├── Log analytics
├── Faceted search
└── Real-time indexing

Example Architecture

POLYGLOT PERSISTENCE EXAMPLE (E-commerce)

┌─────────────────────────────────────────────────────────────┐
│                     Application Layer                        │
└─────────────────────────────────────────────────────────────┘
         │              │              │              │
         ▼              ▼              ▼              ▼
    ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐
    │PostgreSQL│   │ MongoDB │   │  Redis  │   │Elastic- │
    │         │   │         │   │         │   │search   │
    │ Orders  │   │ Product │   │ Sessions│   │ Search  │
    │ Users   │   │ Catalog │   │ Cart    │   │ Index   │
    │ Payments│   │         │   │ Cache   │   │         │
    └─────────┘   └─────────┘   └─────────┘   └─────────┘

Each store optimized for its access patterns

Messaging Patterns

What It Is

Use a queue between producer and consumer to handle variable loads without overwhelming the consumer.

Pattern Structure

WITHOUT QUEUE (Direct Coupling)
Producer ──────────────────▶ Consumer
         High load = overwhelmed

WITH QUEUE (Load Leveling)
Producer ──▶ [  Queue  ] ──▶ Consumer
              Buffer absorbs spikes
              Consumer processes at own pace

BEHAVIOR DURING SPIKE
       Incoming  │ ████████████████████
         Load    │ ████████████
                 │ █████████
                 │ ██████████████
                 └─────────────────────▶ Time

       Queue     │     ████
       Depth     │  █████████
                 │ ██████████████
                 │ █████████████████
                 └─────────────────────▶ Time
                 (Queue absorbs, drains gradually)

Implementation

AWS SQS EXAMPLE

Producer (Lambda/EC2):
sqs.sendMessage({
  QueueUrl: 'https://sqs.../my-queue',
  MessageBody: JSON.stringify(event),
  MessageGroupId: 'orders' // for FIFO
});

Consumer (Lambda trigger or polling):
exports.handler = async (event) => {
  for (const record of event.Records) {
    const message = JSON.parse(record.body);
    await processMessage(message);
  }
};

CONFIGURATION
├── Visibility timeout: Time to process before retry
├── Message retention: How long unprocessed messages kept
├── Dead letter queue: Where failed messages go
└── Batch size: Messages processed per invocation

Resilience Patterns

What It Is

When operations fail, retry with increasing delays to avoid overwhelming struggling services.

Implementation

EXPONENTIAL BACKOFF

Attempt 1: Immediate
Attempt 2: Wait 1 second
Attempt 3: Wait 2 seconds
Attempt 4: Wait 4 seconds
Attempt 5: Wait 8 seconds
         → Give up, return error

WITH JITTER (Recommended)
delay = base * 2^attempt + random(0, base)

Jitter prevents thundering herd when many clients
retry simultaneously after an outage.

Configuration

// AWS SDK v3 default retry configuration
const client = new S3Client({
  maxAttempts: 3,
  retryStrategy: new StandardRetryStrategy(async () => 3, {
    retryDecider: (error) => {
      // Retry on throttling, transient errors
      return error.$retryable?.throttling ||
             error.$fault === 'server';
    },
    delayDecider: (delayBase, attempts) => {
      return delayBase * Math.pow(2, attempts - 1);
    }
  })
});

Deployment Patterns

What It Is

Maintain two identical production environments. Deploy to inactive, then switch traffic.

Process

BLUE-GREEN DEPLOYMENT

BEFORE DEPLOYMENT
                    ┌─────────────────────┐
     Traffic ──────▶│   Blue (v1.0)       │ ◀── Active
                    └─────────────────────┘
                    ┌─────────────────────┐
                    │   Green (idle)      │ ◀── Inactive
                    └─────────────────────┘

DEPLOY TO GREEN
                    ┌─────────────────────┐
     Traffic ──────▶│   Blue (v1.0)       │ ◀── Active
                    └─────────────────────┘
                    ┌─────────────────────┐
        Deploy ────▶│   Green (v1.1)      │ ◀── Deploy here
                    └─────────────────────┘

SWITCH TRAFFIC
                    ┌─────────────────────┐
                    │   Blue (v1.0)       │ ◀── Standby
                    └─────────────────────┘
                    ┌─────────────────────┐
     Traffic ──────▶│   Green (v1.1)      │ ◀── Active
                    └─────────────────────┘

ROLLBACK = Switch traffic back to Blue

Pros and Cons

ProsCons
Instant rollbackDouble infrastructure cost
Zero downtimeDatabase migrations complex
Full testing before switchSession management needed
Clean cutoverConfiguration must be in sync

Quick Reference Card

┌─────────────────────────────────────────────────────────────┐
│            CLOUD ARCHITECTURE CHEAT SHEET                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  CLOUD-NATIVE PRINCIPLES                                    │
│  ─────────────────────────────────────────────────────────  │
│  • Design for failure (everything fails eventually)         │
│  • Prefer managed services (reduce operational burden)      │
│  • Embrace elasticity (scale out, not up)                   │
│  • Decouple with messaging (async over sync)                │
│  • Automate everything (infrastructure as code)             │
│                                                             │
│  COMPUTE SELECTION                                          │
│  ─────────────────────────────────────────────────────────  │
│  Serverless → Event-driven, variable load, < 15 min         │
│  Containers → Full control, long-running, predictable       │
│  VMs        → Legacy apps, specific OS requirements         │
│                                                             │
│  DATA STORE SELECTION                                       │
│  ─────────────────────────────────────────────────────────  │
│  Relational  → ACID, complex queries, relationships         │
│  Document    → Flexible schema, hierarchical data           │
│  Key-Value   → Simple lookup, caching, sessions             │
│  Graph       → Highly connected data, traversals            │
│                                                             │
│  RESILIENCE PATTERNS                                        │
│  ─────────────────────────────────────────────────────────  │
│  Retry       → Transient failures, exponential backoff      │
│  Circuit     → Fail fast, prevent cascade                   │
│  Bulkhead    → Isolate failures, limit blast radius         │
│  Timeout     → Don't wait forever, fail gracefully          │
│                                                             │
│  DEPLOYMENT PATTERNS                                        │
│  ─────────────────────────────────────────────────────────  │
│  Blue-Green  → Instant rollback, double infrastructure      │
│  Canary      → Gradual rollout, metric-based progression    │
│  Feature Flag→ Decouple deploy from release                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘


Sources