Skip to main content

Command Palette

Search for a command to run...

quality attributes

Performance: Designing for Speed

Architectural tactics and patterns for building high-performance systems that meet latency, throughput, and resource utilization targets.

Performance: Designing for Speed

TL;DR

Performance is about time and resources: how fast the system responds (latency), how much work it handles (throughput), and how efficiently it uses resources (utilization). Measure first, then optimize. Most performance problems stem from inefficient data access, lack of caching, or synchronous processing of work that could be async.

Key Takeaways

  • Measure before optimizing: Profile to find actual bottlenecks, don't guess
  • Latency vs throughput: Optimizing one may hurt the other—know your priority
  • Caching is powerful: But adds complexity and consistency challenges
  • Async for long operations: Don't block threads waiting for slow operations
  • Database is often the bottleneck: Optimize queries, indexes, and access patterns first

Why This Matters

Users expect fast systems. Amazon found that every 100ms of latency cost 1% in sales. Google found that a 500ms delay reduced search traffic by 20%. Performance directly impacts user experience, conversion rates, and operational costs. Poor performance also indicates architectural problems—systems that struggle under load often have deeper issues with coupling, data access, or resource management.

Premature Optimization

"Premature optimization is the root of all evil" - Donald Knuth. Measure first. The bottleneck is rarely where you think it is. Profile, identify the actual hot path, then optimize.


Performance Fundamentals

Key Metrics

PERFORMANCE METRICS

LATENCY (Response Time)
├── Time from request to response
├── Measured in ms or seconds
├── Report percentiles: p50, p95, p99
└── p99 matters more than average

THROUGHPUT
├── Work completed per time unit
├── Requests per second (RPS)
├── Transactions per second (TPS)
└── Messages per second (MPS)

UTILIZATION
├── Resource consumption percentage
├── CPU, Memory, Disk, Network
├── Target: 60-80% for headroom
└── >90% = capacity planning needed

SCALABILITY
├── How performance changes with load
├── Linear: 2x resources = 2x throughput
├── Sublinear: Diminishing returns
└── Superlinear: Contention issues

Latency Components

REQUEST LIFECYCLE

Client                                           Server
  │                                                │
  │─── Network latency (client → server) ─────────▶│
  │                                                │─┐
  │                                                │ │ Processing
  │                                                │ │ time
  │                                                │─┘
  │◀── Network latency (server → client) ─────────│
  │                                                │

TOTAL LATENCY = Network (request)
              + Queue time
              + Processing time
              + Network (response)

PROCESSING BREAKDOWN
├── Application logic
├── Database queries
├── External service calls
├── Serialization/deserialization
└── I/O operations

Percentiles Matter

WHY PERCENTILES, NOT AVERAGES

Scenario: 100 requests
├── 95 requests: 50ms
└── 5 requests: 1000ms

Average: (95×50 + 5×1000) / 100 = 97.5ms
p50 (median): 50ms
p95: 50ms
p99: 1000ms

The average hides the 5% of users having terrible experience.
p99 reveals the worst-case most users might encounter.

REPORTING STANDARD
├── p50: Typical experience
├── p95: Most users' worst case
├── p99: Edge case (still matters at scale)
└── p99.9: Long tail (important for critical paths)

Performance Tactics

Goal

Decrease the amount of work the system must do.

Tactics

TacticDescriptionImplementation
CachingStore computed resultsRedis, CDN, application cache
CompressionReduce data sizegzip, Brotli for HTTP
PaginationLimit result setsCursor-based pagination
Lazy LoadingLoad on demandDefer non-critical resources

Caching Strategies

CACHE HIERARCHY

┌─────────────────────────────────────────────────────┐
│                    Client                           │
│  ┌─────────────┐                                    │
│  │Browser Cache│ ← Fastest, limited size            │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────────┐
│                     CDN                             │
│  ┌─────────────┐                                    │
│  │ Edge Cache  │ ← Static content, geographic       │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────────┐
│                 Application                         │
│  ┌─────────────┐   ┌─────────────┐                  │
│  │ App Cache   │   │ Redis/Memcached│               │
│  │ (in-memory) │   │ (distributed)  │               │
│  └─────────────┘   └─────────────┘                  │
└─────────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────────┐
│                   Database                          │
│  ┌─────────────┐                                    │
│  │ Query Cache │ ← DB-level caching                 │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘

Cache Patterns

CACHE-ASIDE (Lazy Loading)
1. Check cache
2. If miss, load from DB
3. Store in cache
4. Return result

Best for: Read-heavy, tolerance for stale data

WRITE-THROUGH
1. Write to cache
2. Cache writes to DB (sync)
3. Return success

Best for: Data consistency critical

WRITE-BEHIND (Write-Back)
1. Write to cache
2. Return success immediately
3. Cache writes to DB (async)

Best for: Write-heavy, can tolerate some loss

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Define your invalidation strategy before implementing caching.


Performance Testing

Test Types

TypePurposeDuration
Load TestExpected load behaviorMinutes to hours
Stress TestBreaking pointUntil failure
Soak TestStability over timeHours to days
Spike TestSudden load changesBrief intervals

Performance Test Process

PERFORMANCE TESTING WORKFLOW

1. BASELINE
   └── Measure current performance

2. DEFINE TARGETS
   ├── Latency: p99 < 200ms
   ├── Throughput: 1000 RPS
   └── Error rate: < 0.1%

3. IDENTIFY SCENARIOS
   ├── Common user journeys
   ├── Peak load patterns
   └── Edge cases

4. EXECUTE TESTS
   ├── Gradual load increase
   ├── Sustained load
   └── Spike scenarios

5. ANALYZE RESULTS
   ├── Identify bottlenecks
   ├── Resource utilization
   └── Error patterns

6. OPTIMIZE & REPEAT

Tools

ToolTypeUse Case
k6Load testingDeveloper-friendly, JavaScript
JMeterLoad testingEnterprise, comprehensive
GatlingLoad testingScala-based, CI/CD friendly
LocustLoad testingPython, distributed
wrkBenchmarkingSimple HTTP benchmarking

Common Performance Anti-Patterns

FeatureAnti-PatternProblemSolution
N+1 Queries100 items = 101 queriesEager loading, joins, batch queries
No CachingRepeated expensive operationsCache at appropriate levels
Synchronous EverythingThreads blocked on I/OAsync for long operations
Missing IndexesFull table scansIndex frequently queried columns
SELECT *Transfer unnecessary dataSelect only needed columns
Unbounded QueriesMemory exhaustionAlways use LIMIT/pagination
Logging EverythingI/O overheadLog sampling, appropriate levels
Chatty APIsNetwork round-trip overheadAggregate endpoints, GraphQL

Quick Reference Card

┌─────────────────────────────────────────────────────────────┐
│               PERFORMANCE CHEAT SHEET                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  METRICS                                                    │
│  ─────────────────────────────────────────────────────────  │
│  Latency     → Time to respond (report p50, p95, p99)       │
│  Throughput  → Requests per second (RPS)                    │
│  Utilization → Resource consumption (target 60-80%)         │
│                                                             │
│  TACTICS                                                    │
│  ─────────────────────────────────────────────────────────  │
│  REDUCE DEMAND                                              │
│  • Cache at every layer (browser, CDN, app, DB)             │
│  • Compress responses (gzip, Brotli)                        │
│  • Paginate large result sets                               │
│                                                             │
│  MANAGE RESOURCES                                           │
│  • Pool connections and threads                             │
│  • Use async for I/O-bound operations                       │
│  • Set timeouts on all external calls                       │
│                                                             │
│  OPTIMIZE DATA ACCESS                                       │
│  • Index WHERE and JOIN columns                             │
│  • Avoid N+1 queries (use eager loading)                    │
│  • Select only needed columns                               │
│                                                             │
│  SCALE HORIZONTALLY                                         │
│  • Design stateless services                                │
│  • Use load balancing                                       │
│  • Shard data when necessary                                │
│                                                             │
│  QUICK WINS                                                 │
│  ─────────────────────────────────────────────────────────  │
│  1. Add caching (biggest impact for read-heavy)             │
│  2. Add database indexes                                    │
│  3. Fix N+1 queries                                         │
│  4. Enable compression                                      │
│  5. Use async processing                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘


Sources