quality attributes

Performance: Designing for Speed

Architectural tactics and patterns for building high-performance systems that meet latency, throughput, and resource utilization targets.

Performance: Designing for Speed

TL;DR

Performance is about time and resources: how fast the system responds (latency), how much work it handles (throughput), and how efficiently it uses resources (utilization). Measure first, then optimize. Most performance problems stem from inefficient data access, lack of caching, or synchronous processing of work that could be async.

Key Takeaways

Measure before optimizing: Profile to find actual bottlenecks, don't guess
Latency vs throughput: Optimizing one may hurt the other—know your priority
Caching is powerful: But adds complexity and consistency challenges
Async for long operations: Don't block threads waiting for slow operations
Database is often the bottleneck: Optimize queries, indexes, and access patterns first

Why This Matters

Users expect fast systems. Amazon found that every 100ms of latency cost 1% in sales. Google found that a 500ms delay reduced search traffic by 20%. Performance directly impacts user experience, conversion rates, and operational costs. Poor performance also indicates architectural problems—systems that struggle under load often have deeper issues with coupling, data access, or resource management.

Premature Optimization

"Premature optimization is the root of all evil" - Donald Knuth. Measure first. The bottleneck is rarely where you think it is. Profile, identify the actual hot path, then optimize.

Performance Fundamentals

Key Metrics

PERFORMANCE METRICS

LATENCY (Response Time)
├── Time from request to response
├── Measured in ms or seconds
├── Report percentiles: p50, p95, p99
└── p99 matters more than average

THROUGHPUT
├── Work completed per time unit
├── Requests per second (RPS)
├── Transactions per second (TPS)
└── Messages per second (MPS)

UTILIZATION
├── Resource consumption percentage
├── CPU, Memory, Disk, Network
├── Target: 60-80% for headroom
└── >90% = capacity planning needed

SCALABILITY
├── How performance changes with load
├── Linear: 2x resources = 2x throughput
├── Sublinear: Diminishing returns
└── Superlinear: Contention issues

Latency Components

REQUEST LIFECYCLE

Client                                           Server
  │                                                │
  │─── Network latency (client → server) ─────────▶│
  │                                                │─┐
  │                                                │ │ Processing
  │                                                │ │ time
  │                                                │─┘
  │◀── Network latency (server → client) ─────────│
  │                                                │

TOTAL LATENCY = Network (request)
              + Queue time
              + Processing time
              + Network (response)

PROCESSING BREAKDOWN
├── Application logic
├── Database queries
├── External service calls
├── Serialization/deserialization
└── I/O operations

Percentiles Matter

WHY PERCENTILES, NOT AVERAGES

Scenario: 100 requests
├── 95 requests: 50ms
└── 5 requests: 1000ms

Average: (95×50 + 5×1000) / 100 = 97.5ms
p50 (median): 50ms
p95: 50ms
p99: 1000ms

The average hides the 5% of users having terrible experience.
p99 reveals the worst-case most users might encounter.

REPORTING STANDARD
├── p50: Typical experience
├── p95: Most users' worst case
├── p99: Edge case (still matters at scale)
└── p99.9: Long tail (important for critical paths)

Tactic	Description	Implementation
Caching	Store computed results	Redis, CDN, application cache
Compression	Reduce data size	gzip, Brotli for HTTP
Pagination	Limit result sets	Cursor-based pagination
Lazy Loading	Load on demand	Defer non-critical resources

Caching Strategies

CACHE HIERARCHY

┌─────────────────────────────────────────────────────┐
│                    Client                           │
│  ┌─────────────┐                                    │
│  │Browser Cache│ ← Fastest, limited size            │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────────┐
│                     CDN                             │
│  ┌─────────────┐                                    │
│  │ Edge Cache  │ ← Static content, geographic       │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────────┐
│                 Application                         │
│  ┌─────────────┐   ┌─────────────┐                  │
│  │ App Cache   │   │ Redis/Memcached│               │
│  │ (in-memory) │   │ (distributed)  │               │
│  └─────────────┘   └─────────────┘                  │
└─────────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────────┐
│                   Database                          │
│  ┌─────────────┐                                    │
│  │ Query Cache │ ← DB-level caching                 │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘

Cache Patterns

CACHE-ASIDE (Lazy Loading)
1. Check cache
2. If miss, load from DB
3. Store in cache
4. Return result

Best for: Read-heavy, tolerance for stale data

WRITE-THROUGH
1. Write to cache
2. Cache writes to DB (sync)
3. Return success

Best for: Data consistency critical

WRITE-BEHIND (Write-Back)
1. Write to cache
2. Return success immediately
3. Cache writes to DB (async)

Best for: Write-heavy, can tolerate some loss

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Define your invalidation strategy before implementing caching.

Performance Testing

Test Types

Type	Purpose	Duration
Load Test	Expected load behavior	Minutes to hours
Stress Test	Breaking point	Until failure
Soak Test	Stability over time	Hours to days
Spike Test	Sudden load changes	Brief intervals

Performance Test Process

PERFORMANCE TESTING WORKFLOW

1. BASELINE
   └── Measure current performance

2. DEFINE TARGETS
   ├── Latency: p99 < 200ms
   ├── Throughput: 1000 RPS
   └── Error rate: < 0.1%

3. IDENTIFY SCENARIOS
   ├── Common user journeys
   ├── Peak load patterns
   └── Edge cases

4. EXECUTE TESTS
   ├── Gradual load increase
   ├── Sustained load
   └── Spike scenarios

5. ANALYZE RESULTS
   ├── Identify bottlenecks
   ├── Resource utilization
   └── Error patterns

6. OPTIMIZE & REPEAT

Tools

Tool	Type	Use Case
k6	Load testing	Developer-friendly, JavaScript
JMeter	Load testing	Enterprise, comprehensive
Gatling	Load testing	Scala-based, CI/CD friendly
Locust	Load testing	Python, distributed
wrk	Benchmarking	Simple HTTP benchmarking

Common Performance Anti-Patterns

Feature	Anti-Pattern	Problem
N+1 Queries	100 items = 101 queries	Eager loading, joins, batch queries
No Caching	Repeated expensive operations	Cache at appropriate levels
Synchronous Everything	Threads blocked on I/O	Async for long operations
Missing Indexes	Full table scans	Index frequently queried columns
SELECT *	Transfer unnecessary data	Select only needed columns
Unbounded Queries	Memory exhaustion	Always use LIMIT/pagination
Logging Everything	I/O overhead	Log sampling, appropriate levels
Chatty APIs	Network round-trip overhead	Aggregate endpoints, GraphQL

Quick Reference Card

┌─────────────────────────────────────────────────────────────┐
│               PERFORMANCE CHEAT SHEET                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  METRICS                                                    │
│  ─────────────────────────────────────────────────────────  │
│  Latency     → Time to respond (report p50, p95, p99)       │
│  Throughput  → Requests per second (RPS)                    │
│  Utilization → Resource consumption (target 60-80%)         │
│                                                             │
│  TACTICS                                                    │
│  ─────────────────────────────────────────────────────────  │
│  REDUCE DEMAND                                              │
│  • Cache at every layer (browser, CDN, app, DB)             │
│  • Compress responses (gzip, Brotli)                        │
│  • Paginate large result sets                               │
│                                                             │
│  MANAGE RESOURCES                                           │
│  • Pool connections and threads                             │
│  • Use async for I/O-bound operations                       │
│  • Set timeouts on all external calls                       │
│                                                             │
│  OPTIMIZE DATA ACCESS                                       │
│  • Index WHERE and JOIN columns                             │
│  • Avoid N+1 queries (use eager loading)                    │
│  • Select only needed columns                               │
│                                                             │
│  SCALE HORIZONTALLY                                         │
│  • Design stateless services                                │
│  • Use load balancing                                       │
│  • Shard data when necessary                                │
│                                                             │
│  QUICK WINS                                                 │
│  ─────────────────────────────────────────────────────────  │
│  1. Add caching (biggest impact for read-heavy)             │
│  2. Add database indexes                                    │
│  3. Fix N+1 queries                                         │
│  4. Enable compression                                      │
│  5. Use async processing                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quality Attributes Overview - All quality attributes
Availability - Availability and performance trade-offs
Cloud Architecture - Cloud performance patterns

Sources

High Performance Browser Networking - Ilya Grigorik
Systems Performance - Brendan Gregg
AWS Performance Efficiency Pillar
Database Indexing Guide - Markus Winand

Performance: Designing for Speed

Performance: Designing for Speed

TL;DR

Key Takeaways

Why This Matters

Performance Fundamentals

Key Metrics

Latency Components

Percentiles Matter

Performance Tactics

Goal

Tactics

Caching Strategies

Cache Patterns

Performance Testing

Test Types

Performance Test Process

Tools

Common Performance Anti-Patterns

Quick Reference Card

Sources

Related Topics

Quality Attributes: Design for Non-Functional Requirements

Availability: Designing for Uptime

Security: Designing for Protection