quality attributes
Performance: Designing for Speed
Architectural tactics and patterns for building high-performance systems that meet latency, throughput, and resource utilization targets.
Performance: Designing for Speed
TL;DR
Performance is about time and resources: how fast the system responds (latency), how much work it handles (throughput), and how efficiently it uses resources (utilization). Measure first, then optimize. Most performance problems stem from inefficient data access, lack of caching, or synchronous processing of work that could be async.
Key Takeaways
- Measure before optimizing: Profile to find actual bottlenecks, don't guess
- Latency vs throughput: Optimizing one may hurt the other—know your priority
- Caching is powerful: But adds complexity and consistency challenges
- Async for long operations: Don't block threads waiting for slow operations
- Database is often the bottleneck: Optimize queries, indexes, and access patterns first
Why This Matters
Users expect fast systems. Amazon found that every 100ms of latency cost 1% in sales. Google found that a 500ms delay reduced search traffic by 20%. Performance directly impacts user experience, conversion rates, and operational costs. Poor performance also indicates architectural problems—systems that struggle under load often have deeper issues with coupling, data access, or resource management.
Premature Optimization
"Premature optimization is the root of all evil" - Donald Knuth. Measure first. The bottleneck is rarely where you think it is. Profile, identify the actual hot path, then optimize.
Performance Fundamentals
Key Metrics
PERFORMANCE METRICS
LATENCY (Response Time)
├── Time from request to response
├── Measured in ms or seconds
├── Report percentiles: p50, p95, p99
└── p99 matters more than average
THROUGHPUT
├── Work completed per time unit
├── Requests per second (RPS)
├── Transactions per second (TPS)
└── Messages per second (MPS)
UTILIZATION
├── Resource consumption percentage
├── CPU, Memory, Disk, Network
├── Target: 60-80% for headroom
└── >90% = capacity planning needed
SCALABILITY
├── How performance changes with load
├── Linear: 2x resources = 2x throughput
├── Sublinear: Diminishing returns
└── Superlinear: Contention issues
Latency Components
REQUEST LIFECYCLE
Client Server
│ │
│─── Network latency (client → server) ─────────▶│
│ │─┐
│ │ │ Processing
│ │ │ time
│ │─┘
│◀── Network latency (server → client) ─────────│
│ │
TOTAL LATENCY = Network (request)
+ Queue time
+ Processing time
+ Network (response)
PROCESSING BREAKDOWN
├── Application logic
├── Database queries
├── External service calls
├── Serialization/deserialization
└── I/O operations
Percentiles Matter
WHY PERCENTILES, NOT AVERAGES
Scenario: 100 requests
├── 95 requests: 50ms
└── 5 requests: 1000ms
Average: (95×50 + 5×1000) / 100 = 97.5ms
p50 (median): 50ms
p95: 50ms
p99: 1000ms
The average hides the 5% of users having terrible experience.
p99 reveals the worst-case most users might encounter.
REPORTING STANDARD
├── p50: Typical experience
├── p95: Most users' worst case
├── p99: Edge case (still matters at scale)
└── p99.9: Long tail (important for critical paths)
Performance Tactics
Goal
Decrease the amount of work the system must do.
Tactics
| Tactic | Description | Implementation |
|---|---|---|
| Caching | Store computed results | Redis, CDN, application cache |
| Compression | Reduce data size | gzip, Brotli for HTTP |
| Pagination | Limit result sets | Cursor-based pagination |
| Lazy Loading | Load on demand | Defer non-critical resources |
Caching Strategies
CACHE HIERARCHY
┌─────────────────────────────────────────────────────┐
│ Client │
│ ┌─────────────┐ │
│ │Browser Cache│ ← Fastest, limited size │
│ └─────────────┘ │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ CDN │
│ ┌─────────────┐ │
│ │ Edge Cache │ ← Static content, geographic │
│ └─────────────┘ │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Application │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ App Cache │ │ Redis/Memcached│ │
│ │ (in-memory) │ │ (distributed) │ │
│ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Database │
│ ┌─────────────┐ │
│ │ Query Cache │ ← DB-level caching │
│ └─────────────┘ │
└─────────────────────────────────────────────────────┘
Cache Patterns
CACHE-ASIDE (Lazy Loading)
1. Check cache
2. If miss, load from DB
3. Store in cache
4. Return result
Best for: Read-heavy, tolerance for stale data
WRITE-THROUGH
1. Write to cache
2. Cache writes to DB (sync)
3. Return success
Best for: Data consistency critical
WRITE-BEHIND (Write-Back)
1. Write to cache
2. Return success immediately
3. Cache writes to DB (async)
Best for: Write-heavy, can tolerate some loss
Cache Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Define your invalidation strategy before implementing caching.
Performance Testing
Test Types
| Type | Purpose | Duration |
|---|---|---|
| Load Test | Expected load behavior | Minutes to hours |
| Stress Test | Breaking point | Until failure |
| Soak Test | Stability over time | Hours to days |
| Spike Test | Sudden load changes | Brief intervals |
Performance Test Process
PERFORMANCE TESTING WORKFLOW
1. BASELINE
└── Measure current performance
2. DEFINE TARGETS
├── Latency: p99 < 200ms
├── Throughput: 1000 RPS
└── Error rate: < 0.1%
3. IDENTIFY SCENARIOS
├── Common user journeys
├── Peak load patterns
└── Edge cases
4. EXECUTE TESTS
├── Gradual load increase
├── Sustained load
└── Spike scenarios
5. ANALYZE RESULTS
├── Identify bottlenecks
├── Resource utilization
└── Error patterns
6. OPTIMIZE & REPEAT
Tools
| Tool | Type | Use Case |
|---|---|---|
| k6 | Load testing | Developer-friendly, JavaScript |
| JMeter | Load testing | Enterprise, comprehensive |
| Gatling | Load testing | Scala-based, CI/CD friendly |
| Locust | Load testing | Python, distributed |
| wrk | Benchmarking | Simple HTTP benchmarking |
Common Performance Anti-Patterns
| Feature | Anti-Pattern | Problem | Solution |
|---|---|---|---|
| N+1 Queries | 100 items = 101 queries | Eager loading, joins, batch queries | |
| No Caching | Repeated expensive operations | Cache at appropriate levels | |
| Synchronous Everything | Threads blocked on I/O | Async for long operations | |
| Missing Indexes | Full table scans | Index frequently queried columns | |
| SELECT * | Transfer unnecessary data | Select only needed columns | |
| Unbounded Queries | Memory exhaustion | Always use LIMIT/pagination | |
| Logging Everything | I/O overhead | Log sampling, appropriate levels | |
| Chatty APIs | Network round-trip overhead | Aggregate endpoints, GraphQL |
Quick Reference Card
┌─────────────────────────────────────────────────────────────┐
│ PERFORMANCE CHEAT SHEET │
├─────────────────────────────────────────────────────────────┤
│ │
│ METRICS │
│ ───────────────────────────────────────────────────────── │
│ Latency → Time to respond (report p50, p95, p99) │
│ Throughput → Requests per second (RPS) │
│ Utilization → Resource consumption (target 60-80%) │
│ │
│ TACTICS │
│ ───────────────────────────────────────────────────────── │
│ REDUCE DEMAND │
│ • Cache at every layer (browser, CDN, app, DB) │
│ • Compress responses (gzip, Brotli) │
│ • Paginate large result sets │
│ │
│ MANAGE RESOURCES │
│ • Pool connections and threads │
│ • Use async for I/O-bound operations │
│ • Set timeouts on all external calls │
│ │
│ OPTIMIZE DATA ACCESS │
│ • Index WHERE and JOIN columns │
│ • Avoid N+1 queries (use eager loading) │
│ • Select only needed columns │
│ │
│ SCALE HORIZONTALLY │
│ • Design stateless services │
│ • Use load balancing │
│ • Shard data when necessary │
│ │
│ QUICK WINS │
│ ───────────────────────────────────────────────────────── │
│ 1. Add caching (biggest impact for read-heavy) │
│ 2. Add database indexes │
│ 3. Fix N+1 queries │
│ 4. Enable compression │
│ 5. Use async processing │
│ │
└─────────────────────────────────────────────────────────────┘
Related Topics
- Quality Attributes Overview - All quality attributes
- Availability - Availability and performance trade-offs
- Cloud Architecture - Cloud performance patterns
Sources
- High Performance Browser Networking - Ilya Grigorik
- Systems Performance - Brendan Gregg
- AWS Performance Efficiency Pillar
- Database Indexing Guide - Markus Winand