๐ค
Performance Engineering System
Complete performance engineering system โ profiling, optimization, load testing, capacity planning, and performance culture. Use when diagnosing slow applica...
ๅฎๅ
จ้่ฟ
โ๏ธ่ๆฌ
ๆ่ฝ่ฏดๆ
name: afrexai-performance-engineering description: Complete performance engineering system โ profiling, optimization, load testing, capacity planning, and performance culture. Use when diagnosing slow applications, optimizing code/queries/infrastructure, load testing before launch, planning capacity, or building performance into CI/CD. Covers Node.js, Python, Go, Java, databases, APIs, and frontend. metadata: openclaw: os: [linux, darwin, win32]
Performance Engineering System
From "it's slow" to "here's why and here's the fix" โ a complete methodology for measuring, diagnosing, optimizing, and preventing performance problems.
Phase 1: Performance Investigation Brief
Before touching anything, define the problem.
# performance-brief.yaml
investigation:
reported_by: ""
reported_date: ""
system: "" # service/app name
environment: "" # production, staging, dev
problem_statement:
symptom: "" # "API response time increased 3x"
impact: "" # "15% of users seeing timeouts"
since_when: "" # "After deploy v2.14 on Feb 20"
affected_scope: "" # "All endpoints" | "Only /search" | "Users in EU"
baselines:
target_p50: "" # e.g., "200ms"
target_p95: "" # e.g., "500ms"
target_p99: "" # e.g., "1000ms"
current_p50: ""
current_p95: ""
current_p99: ""
throughput_target: "" # e.g., "1000 rps"
error_rate_target: "" # e.g., "<0.1%"
constraints:
budget: "" # time/money for optimization
risk_tolerance: "" # "Can we change the schema?" "Can we add caching?"
deadline: "" # "Must fix before Black Friday"
hypothesis:
primary: "" # "N+1 queries in the new recommendation engine"
secondary: "" # "Connection pool exhaustion under load"
evidence: "" # "Slow query log shows 200+ queries per request"
Performance Budget Framework
Set budgets BEFORE building, not after complaints:
| Metric | Web App | API | Mobile | Batch Job |
|---|---|---|---|---|
| P50 response | <200ms | <100ms | <300ms | N/A |
| P95 response | <500ms | <250ms | <800ms | N/A |
| P99 response | <1s | <500ms | <1.5s | N/A |
| Error rate | <0.1% | <0.01% | <0.5% | <0.001% |
| Time to Interactive | <3s | N/A | <2s | N/A |
| Memory per request | <50MB | <20MB | <100MB | <1GB |
| CPU per request | <100ms | <50ms | <200ms | N/A |
| Throughput | 100+ rps | 500+ rps | N/A | items/min |
Phase 2: Measurement & Profiling
The Golden Rule
Never optimize without measuring first. Never measure without a hypothesis.
Profiling Decision Tree
Is it slow?
โโโ YES โ Where is time spent?
โ โโโ CPU-bound โ Profile CPU (flame graph)
โ โ โโโ Hot function found โ Optimize algorithm/data structure
โ โ โโโ Spread evenly โ Architecture problem (too many layers)
โ โโโ I/O-bound โ Profile I/O
โ โ โโโ Database โ Query analysis (Phase 4)
โ โ โโโ Network โ Connection profiling
โ โ โโโ Disk โ I/O scheduler + buffering
โ โ โโโ External API โ Caching + async + circuit breaker
โ โโโ Memory-bound โ Profile allocations
โ โ โโโ GC pressure โ Reduce allocations, pool objects
โ โ โโโ Memory leak โ Heap snapshot comparison
โ โ โโโ Cache thrashing โ Resize or eviction policy
โ โโโ Concurrency-bound โ Profile locks/contention
โ โโโ Lock contention โ Reduce critical section, lock-free structures
โ โโโ Thread starvation โ Pool sizing
โ โโโ Deadlock โ Lock ordering analysis
โโโ NO โ Define "fast enough" (see budgets above)
CPU Profiling by Language
Node.js
# Built-in profiler (V8)
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Inspector-based (connect Chrome DevTools)
node --inspect app.js
# Open chrome://inspect โ Profiler โ Start
# Clinic.js (best overall Node.js profiler)
npx clinic doctor -- node app.js
npx clinic flame -- node app.js # Flame graph
npx clinic bubbleprof -- node app.js # Async bottlenecks
# 0x (flame graphs)
npx 0x app.js
Python
# cProfile (built-in)
import cProfile
import pstats
profiler = cProfile.Profile()
profiler.enable()
# ... code to profile ...
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20
# Line profiler (pip install line-profiler)
# Add @profile decorator, then:
# kernprof -l -v script.py
# py-spy (sampling profiler, no code changes)
# pip install py-spy
# py-spy top --pid <PID>
# py-spy record -o profile.svg --pid <PID> # Flame graph
# Scalene (CPU + memory + GPU)
# pip install scalene
# scalene script.py
Go
// Built-in pprof
import (
"net/http"
_ "net/http/pprof"
"runtime/pprof"
)
// HTTP server (add to existing server)
// Access: http://localhost:6060/debug/pprof/
go func() { http.ListenAndServe(":6060", nil) }()
// CLI analysis
// go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
// go tool pprof -http=:8080 profile.out # Web UI
Java
# async-profiler (best for JVM)
# https://github.com/async-profiler/async-profiler
./asprof -d 30 -f profile.html <PID>
# JFR (built-in since JDK 11)
java -XX:StartFlightRecording=duration=60s,filename=rec.jfr MyApp
jfr print --events CPULoad rec.jfr
# jstack (thread dump)
jstack <PID> > threads.txt
Memory Profiling
Leak Detection Pattern (any language)
1. Take heap snapshot at T0
2. Run suspected operation N times
3. Force GC
4. Take heap snapshot at T1
5. Compare: objects that grew = potential leak
6. Check: are they reachable? From where? (retention path)
Node.js Memory
// Heap snapshot
const v8 = require('v8');
const fs = require('fs');
function takeSnapshot(label) {
const snapshotStream = v8.writeHeapSnapshot();
console.log(`Heap snapshot written to ${snapshotStream}`);
}
// Process memory monitoring
setInterval(() => {
const mem = process.memoryUsage();
console.log({
rss_mb: (mem.rss / 1048576).toFixed(1),
heap_used_mb: (mem.heapUsed / 1048576).toFixed(1),
heap_total_mb: (mem.heapTotal / 1048576).toFixed(1),
external_mb: (mem.external / 1048576).toFixed(1),
});
}, 10000);
Python Memory
# tracemalloc (built-in)
import tracemalloc
tracemalloc.start()
# ... code ...
snapshot = tracemalloc.take_snapshot()
top = snapshot.statistics('lineno')
for stat in top[:10]:
print(stat)
# objgraph (pip install objgraph)
import objgraph
objgraph.show_most_common_types(limit=20)
objgraph.show_growth(limit=10) # Call twice to see what's growing
Flame Graph Interpretation
Reading a flame graph:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ main() โ โ Entry point (bottom)
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโค
โ processData() โ renderOutput() โ โ Width = time spent
โโโโโโโโโโโโฌโโโโโโโโโโโโค โ
โ parseCSV โ validate โ โ โ Tall = deep call stack
โโโโโโโโโโโโค โ โ
โ readline โ โ โ โ Top = where CPU burns
โโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโ
WHAT TO LOOK FOR:
1. Wide plateaus at top โ CPU-intensive leaf function (optimize this!)
2. Many thin towers โ excessive function calls (batch or reduce)
3. Recursive patterns โ potential stack overflow risk
4. Unexpected width โ function taking more time than expected
5. GC/runtime frames โ memory pressure
ACTION RULES:
- Plateau >20% width โ must investigate
- Plateau >40% width โ almost certainly the bottleneck
- If top 3 functions = 80% of time โ focused optimization will work
- If evenly distributed โ architectural change needed
Phase 3: Common Optimization Patterns
Algorithm & Data Structure Optimizations
| Problem | Bad O() | Fix | Good O() |
|---|---|---|---|
| Search unsorted array | O(n) | Sort + binary search, or use Set/Map | O(log n) or O(1) |
| Nested loop matching | O(nยฒ) | Hash map lookup | O(n) |
| Repeated string concat | O(nยฒ) | StringBuilder/join array | O(n) |
| Sorting already-sorted data | O(n log n) | Check if sorted first | O(n) |
| Finding duplicates | O(nยฒ) | Set-based detection | O(n) |
| Frequent min/max of changing data | O(n) per query | Heap/priority queue | O(log n) |
Caching Strategy Decision Matrix
Should you cache this?
โโโ Does the same input always produce the same output?
โ โโโ YES โ Cache candidate โ
โ โโโ NO โ Can you define a valid TTL?
โ โโโ YES โ Cache with TTL โ
โ โโโ NO โ Don't cache โ
โโโ Is it called frequently?
โ โโโ <10x/min โ Probably not worth caching
โ โโโ >10x/min โ Cache โ
โโโ Is the source data expensive to compute/fetch?
โ โโโ <10ms โ Probably not worth caching
โ โโโ >10ms โ Cache โ
โโโ Does staleness cause problems?
โโโ Critical (financial, auth) โ Short TTL or cache-aside with invalidation
โโโ Important (user data) โ 1-5 min TTL with invalidation
โโโ Tolerant (content, search) โ 5-60 min TTL
CACHE LAYERS (use in order):
1. In-process (Map/LRU) โ <1ฮผs, limited by memory, per-instance
2. Shared cache (Redis/Memcached) โ <1ms, shared across instances
3. CDN/edge cache โ <10ms, geographic distribution
4. Browser cache โ 0ms for user, stale risk
INVALIDATION STRATEGIES:
- TTL-based: simplest, best for read-heavy + staleness-tolerant
- Event-based: publish cache-invalidate on write, best for consistency
- Write-through: update cache on every write, best for write-read patterns
- Cache-aside: app manages cache explicitly, most flexible
Connection Pooling
# Sizing formula
pool_size: min(available_cores * 2 + effective_spindle_count, max_connections / num_instances)
# Rules of thumb:
# - PostgreSQL: connections = cores * 2 + 1 (per pgBouncer docs)
# - MySQL: keep total connections < 150 for most workloads
# - HTTP clients: match to concurrent request volume
# - Redis: usually 5-10 per instance is enough
# Warning signs of pool problems:
# - "connection timeout" errors under load
# - Response time spikes at regular intervals
# - Idle connections holding resources
# - Connection count hitting max_connections
Async & Concurrency Patterns
// BAD: Sequential when independent
const user = await getUser(id);
const orders = await getOrders(id);
const prefs = await getPreferences(id);
// Total: user_time + orders_time + prefs_time
// GOOD: Parallel when independent
const [user, orders, prefs] = await Promise.all([
getUser(id),
getOrders(id),
getPreferences(id),
]);
// Total: max(user_time, orders_time, prefs_time)
// GOOD: Controlled concurrency for many items
// (npm: p-limit, p-map, or manual semaphore)
import pLimit from 'p-limit';
const limit = pLimit(10); // Max 10 concurrent
const results = await Promise.all(
items.map(item => limit(() => processItem(item)))
);
# Python: asyncio for I/O-bound
import asyncio
async def fetch_all(ids):
# Parallel
tasks = [fetch_one(id) for id in ids]
return await asyncio.gather(*tasks)
# Python: ProcessPoolExecutor for CPU-bound
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as pool:
results = list(pool.map(cpu_intensive_fn, items))
N+1 Query Detection & Fix
SYMPTOM: Response time scales linearly with result count
DETECTION: Enable query logging, count queries per request
# Bad: N+1
users = db.query("SELECT * FROM users LIMIT 100")
for user in users:
orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
# Result: 1 + 100 = 101 queries
# Fix 1: JOIN
SELECT u.*, o.* FROM users u
LEFT JOIN orders o ON o.user_id = u.id
LIMIT 100
# Fix 2: Batch load (better for large datasets)
users = db.query("SELECT * FROM users LIMIT 100")
user_ids = [u.id for u in users]
orders = db.query(f"SELECT * FROM orders WHERE user_id IN ({','.join(user_ids)})")
# Result: 2 queries regardless of count
# Fix 3: ORM eager loading
# Drizzle: .with(users.orders)
# SQLAlchemy: joinedload(User.orders)
# Prisma: include: { orders: true }
Phase 4: Database Performance
Query Optimization Checklist
For every slow query:
โก Run EXPLAIN ANALYZE (not just EXPLAIN)
โก Check: is it doing a sequential scan on a large table?
โก Check: is the row estimate accurate? (bad stats = bad plan)
โก Check: are there implicit type casts preventing index use?
โก Check: is it sorting more data than needed? (add LIMIT earlier)
โก Check: is it joining in the right order?
โก Check: can a covering index eliminate table lookups?
โก Check: is the query running during peak hours? (schedule if batch)
EXPLAIN ANALYZE Interpretation
-- PostgreSQL EXPLAIN output reading guide:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
-- Key metrics to check:
-- 1. Actual time vs estimated time (large gap = stale stats โ ANALYZE)
-- 2. Rows actual vs estimated (>10x off = bad stats)
-- 3. Seq Scan on large table (>10K rows) = needs index
-- 4. Sort with external merge = needs more work_mem or index
-- 5. Nested Loop with large outer = consider hash/merge join
-- 6. Buffers shared hit vs read (low hit ratio = needs more shared_buffers)
Index Strategy Guide
WHEN TO ADD AN INDEX:
โ WHERE clause column (equality or range)
โ JOIN condition column
โ ORDER BY column (if query is index-only scan candidate)
โ Foreign key column (prevents table lock on parent delete)
โ Column in a unique constraint
WHEN NOT TO ADD AN INDEX:
โ Table has <1000 rows (seq scan is fine)
โ Column has very low cardinality (boolean, status with 3 values)
โ Write-heavy table where reads are rare
โ You already have 8+ indexes on the table (diminishing returns, write penalty)
INDEX TYPES:
- B-tree (default): equality, range, sorting, LIKE 'prefix%'
- Hash: equality only (rarely better than B-tree)
- GIN: arrays, JSONB, full-text search
- GiST: geometry, range types, full-text
- BRIN: large tables with natural ordering (timestamps, sequential IDs)
COMPOSITE INDEX RULES:
1. Equality columns first, then range columns
2. Most selective column first (if all equality)
3. Index on (a, b) works for WHERE a=1 AND b=2 AND for WHERE a=1 alone
4. Index on (a, b) does NOT work for WHERE b=2 alone
Phase 5: Load Testing
Load Test Design
# load-test-plan.yaml
test_name: ""
target: "" # URL/endpoint
date: ""
scenarios:
- name: "Baseline"
description: "Normal traffic pattern"
vus: 50 # Virtual users
duration: "5m"
ramp_up: "30s"
think_time: "1-3s" # Pause between requests
- name: "Peak"
description: "2x normal traffic (expected peak)"
vus: 100
duration: "10m"
ramp_up: "1m"
- name: "Stress"
description: "Find the breaking point"
vus_start: 50
vus_end: 500
step_duration: "2m" # Add users every 2 min
step_size: 50
- name: "Soak"
description: "Memory leaks, connection exhaustion"
vus: 50
duration: "2h"
pass_criteria:
p95_response_ms: 500
error_rate_pct: 0.1
throughput_rps: 200
k6 Load Test Template
// load-test.js (run: k6 run load-test.js)
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
const errorRate = new Rate('errors');
const responseTime = new Trend('response_time');
export const options = {
stages: [
{ duration: '30s', target: 20 }, // Ramp up
{ duration: '3m', target: 20 }, // Steady
{ duration: '30s', target: 50 }, // Peak
{ duration: '3m', target: 50 }, // Steady peak
{ duration: '30s', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% under 500ms
errors: ['rate<0.01'], // <1% error rate
},
};
export default function () {
const res = http.get('https://api.example.com/endpoint');
check(res, {
'status 200': (r) => r.status === 200,
'response < 500ms': (r) => r.timings.duration < 500,
});
errorRate.add(res.status !== 200);
responseTime.add(res.timings.duration);
sleep(Math.random() * 2 + 1); // 1-3s think time
}
Load Test Results Analysis
READING RESULTS:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Metric โ Healthy โ Warning โ Badโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ p50/p95 ratio โ <2x โ 2-5x โ>5xโ โ High ratio = tail latency problem
โ p95/p99 ratio โ <2x โ 2-3x โ>3xโ โ Outliers affecting some users
โ Error rate โ <0.1% โ 0.1-1% โ>1%โ โ Above 1% = user-visible
โ Throughput drop โ <5% โ 5-20% โ>20%โ โ System under stress
โ CPU at peak โ <70% โ 70-85% โ>85%โ โ No headroom
โ Memory at peak โ <75% โ 75-90% โ>90%โ โ Risk of OOM
โ GC pause time โ <50ms โ 50-200msโ>200msโ โ GC storm
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
BOTTLENECK IDENTIFICATION:
- Throughput plateaus but CPU is low โ I/O bound (DB, network, disk)
- Throughput plateaus and CPU is high โ CPU bound (optimize hot path)
- Response time climbs linearly โ Queue building (capacity limit)
- Response time climbs exponentially โ Resource exhaustion (connection pool, memory)
- Errors spike at specific VU count โ Hard limit hit (max connections, file descriptors)
Phase 6: Frontend Performance
Core Web Vitals Optimization
METRIC โ GOOD โ NEEDS WORK โ POOR โ HOW TO FIX
โโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
LCP โ <2.5s โ 2.5-4s โ >4s โ Optimize largest image/text
FID/INP โ <100ms โ 100-300ms โ >300ms โ Break up long tasks, defer JS
CLS โ <0.1 โ 0.1-0.25 โ >0.25 โ Set dimensions, font-display
LCP FIXES (in priority order):
1. Preload the LCP image: <link rel="preload" as="image" href="...">
2. Use responsive images: srcset with correct sizes
3. Serve WebP/AVIF (30-50% smaller)
4. Remove render-blocking CSS/JS from <head>
5. Use CDN for static assets
6. Server-side render the above-fold content
INP FIXES:
1. Break long tasks (>50ms) with requestIdleCallback or setTimeout(0)
2. Use web workers for CPU-intensive work
3. Debounce/throttle event handlers
4. Defer non-critical JS: <script defer> or dynamic import()
5. Avoid layout thrashing (batch DOM reads, then batch writes)
CLS FIXES:
1. Always set width/height on <img> and <video>
2. Use aspect-ratio CSS for dynamic content
3. Reserve space for ads/embeds
4. Use font-display: swap with size-adjusted fallback
5. Never insert content above existing content
Bundle Optimization
ANALYSIS:
- Webpack: npx webpack-bundle-analyzer stats.json
- Vite: npx vite-bundle-visualizer
- Next.js: @next/bundle-analyzer
REDUCTION STRATEGIES (in order of impact):
1. Code splitting: dynamic import() for routes and heavy components
2. Tree shaking: use ESM imports, avoid barrel files (index.ts re-exports)
3. Replace heavy libraries:
- moment.js (330KB) โ date-fns (tree-shakeable) or dayjs (2KB)
- lodash (530KB) โ lodash-es (tree-shakeable) or native JS
- chart.js โ lightweight alternative for simple charts
4. Lazy load below-fold components
5. Externalize large deps to CDN (React, etc.)
6. Compress: Brotli > gzip (15-20% smaller)
Phase 7: Infrastructure & Scaling
Scaling Decision Framework
VERTICAL SCALING (scale up):
โ Quick fix, no code changes
โ Database servers (often best first move)
โ Memory-bound workloads
โ Diminishing returns past 8-16 cores
โ Single point of failure
โ Expensive at high end
HORIZONTAL SCALING (scale out):
โ Stateless services (APIs, workers)
โ Read-heavy workloads (read replicas)
โ Geographic distribution
โ Requires stateless design
โ Adds complexity (load balancing, session management)
โ Not all workloads parallelize
SCALING CHECKLIST:
โก Can we optimize the code first? (cheapest option)
โก Can we add caching? (often 10-100x improvement)
โก Can we add a read replica? (if read-heavy)
โก Can we queue and process async? (if latency-tolerant)
โก Can we scale vertically? (if CPU/memory bound)
โก Do we need horizontal scaling? (if all above exhausted)
Auto-scaling Configuration
# Kubernetes HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale at 70% CPU
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 1m before scaling up
policies:
- type: Percent
value: 50 # Max 50% increase per step
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5m before scaling down
policies:
- type: Percent
value: 25 # Max 25% decrease per step
periodSeconds: 120
Phase 8: Capacity Planning
Capacity Model Template
# capacity-model.yaml
service: ""
last_updated: ""
current_state:
daily_requests: 0
peak_rps: 0
avg_response_ms: 0
instances: 0
cpu_peak_pct: 0
memory_peak_pct: 0
db_connections_peak: 0
storage_used_gb: 0
growth_model:
request_growth_monthly_pct: 0 # e.g., 15%
storage_growth_monthly_gb: 0
seasonal_peak_multiplier: 0 # e.g., 3x for Black Friday
projections:
# Formula: current * (1 + growth_rate)^months * seasonal_multiplier
3_month:
daily_requests: 0
peak_rps: 0
instances_needed: 0
storage_gb: 0
estimated_cost: ""
6_month:
daily_requests: 0
peak_rps: 0
instances_needed: 0
storage_gb: 0
estimated_cost: ""
12_month:
daily_requests: 0
peak_rps: 0
instances_needed: 0
storage_gb: 0
estimated_cost: ""
headroom_rules:
cpu: "Scale when sustained >70% for 5m"
memory: "Scale when >80%"
storage: "Alert when >75%, expand when >85%"
db_connections: "Alert when >80% of max"
Cost-Performance Tradeoff Analysis
For every optimization, calculate:
ROI = (time_saved_per_month ร cost_per_hour) / implementation_cost
EXAMPLE:
- P95 latency: 800ms โ 200ms after optimization
- Requests/month: 10M
- Time saved: 600ms ร 10M = 1,667 hours of compute
- Compute cost: $0.05/hour = $83/month savings
- Implementation: 16 hours ร $150/hr = $2,400
- Payback: 29 months โ NOT WORTH IT for cost alone
BUT ALSO CONSIDER:
- User experience improvement โ conversion rate
- Reduced infrastructure needs โ fewer instances
- Headroom for growth โ delayed scaling investment
- Developer productivity โ faster local dev cycles
Phase 9: Performance in CI/CD
Automated Performance Gates
# .github/workflows/perf-gate.yml
name: Performance Gate
on: pull_request
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run benchmarks
run: |
# Run your benchmark suite
npm run benchmark -- --json > bench-results.json
- name: Compare with baseline
run: |
# Compare against main branch baseline
node scripts/compare-benchmarks.js \
--baseline benchmarks/baseline.json \
--current bench-results.json \
--threshold 10 # Fail if >10% regression
- name: Load test (on staging)
if: github.base_ref == 'main'
run: |
k6 run --out json=load-results.json tests/load-test.js
# Check thresholds automatically via k6
- name: Bundle size check
run: |
npm run build
node scripts/check-bundle-size.js \
--max-size 250KB \
--max-increase 5%
Performance Regression Detection
AUTOMATED CHECKS (run on every PR):
โก Unit benchmarks: critical path functions < threshold
โก Bundle size: total and per-chunk limits
โก Lighthouse CI: Core Web Vitals pass
โก Query count: no N+1 regressions (count queries per test)
โก Memory: no leak patterns in test suite
WEEKLY CHECKS (cron job):
โก Production p50/p95/p99 trends (compare to 4-week average)
โก Error rate trends
โก Database slow query log review
โก Infrastructure cost vs traffic ratio
โก Cache hit rates
MONTHLY REVIEW:
โก Capacity model update
โก Performance budget review
โก Top 10 slowest endpoints โ optimization candidates
โก Cost-performance analysis
โก Load test full suite against staging
Phase 10: Performance Culture
Performance Review Checklist
Score your system (0-100):
MEASUREMENT (25 points):
โก (5) Performance budgets defined for all key metrics
โก (5) Real User Monitoring (RUM) in production
โก (5) Alerting on p95 degradation
โก (5) Dashboards visible to team
โก (5) Regular load testing
PREVENTION (25 points):
โก (5) Performance gates in CI/CD
โก (5) Bundle size limits enforced
โก (5) Query count checks in tests
โก (5) Code review includes perf review
โก (5) Capacity planning model maintained
OPTIMIZATION (25 points):
โก (5) Caching strategy documented
โก (5) Database indexes reviewed quarterly
โก (5) No known N+1 queries
โก (5) Connection pools properly sized
โก (5) Async patterns used for I/O
OPERATIONS (25 points):
โก (5) Auto-scaling configured and tested
โก (5) Slow query logging enabled
โก (5) Memory leak monitoring
โก (5) Performance incident runbook exists
โก (5) Monthly performance review
Common Anti-Patterns
1. PREMATURE OPTIMIZATION
Problem: Optimizing before measuring
Fix: Profile first, optimize the measured bottleneck
2. MICRO-BENCHMARKING IN ISOLATION
Problem: Function is fast alone but slow in context (cache, contention)
Fix: Always benchmark in realistic conditions with realistic data
3. OPTIMIZING THE WRONG LAYER
Problem: Tuning app code when the DB is the bottleneck
Fix: Use distributed tracing to find the actual bottleneck
4. CACHING EVERYTHING
Problem: Cache invalidation bugs, stale data, memory pressure
Fix: Cache selectively using the decision matrix (Phase 3)
5. PREMATURE HORIZONTAL SCALING
Problem: Adding instances when single instance is underoptimized
Fix: Vertical optimization first, scale second
6. IGNORING TAIL LATENCY
Problem: p50 is fine but p99 is terrible
Fix: Investigate outliers โ they're often the most important users
7. LOAD TESTING IN DEV
Problem: Dev environment doesn't match production
Fix: Load test against staging with production-like data
8. OPTIMIZING COLD PATHS
Problem: Spending time on rarely-executed code
Fix: Profile in production to find actual hot paths
Quick Reference: Tool Selection
| Task | Recommended Tool | Alternative |
|---|---|---|
| HTTP benchmarking | k6 | wrk, ab, hey |
| CPU profiling (Node) | clinic flame | 0x, --prof |
| CPU profiling (Python) | py-spy | Scalene, cProfile |
| CPU profiling (Go) | pprof | go tool trace |
| CPU profiling (Java) | async-profiler | JFR, VisualVM |
| Memory profiling | language-specific (see Phase 2) | |
| CLI benchmarking | hyperfine | time |
| Bundle analysis | webpack-bundle-analyzer | source-map-explorer |
| Web performance | Lighthouse | WebPageTest |
| DB query analysis | EXPLAIN ANALYZE | pgMustard, pganalyze |
| Distributed tracing | Jaeger, Zipkin | OpenTelemetry |
| APM | Datadog, New Relic | Grafana + Prometheus |
| Continuous profiling | Pyroscope | Parca |
Natural Language Commands
"Profile this function" โ CPU profiling with flame graph
"Why is this endpoint slow" โ Full investigation brief + profiling
"Load test the API" โ k6 test design and execution
"Check for memory leaks" โ Heap snapshot comparison workflow
"Optimize this query" โ EXPLAIN ANALYZE + index recommendations
"Review frontend perf" โ Core Web Vitals audit + bundle analysis
"Plan capacity for 10x" โ Capacity model with projections
"Set up perf monitoring" โ CI/CD gates + dashboards + alerts
"Find the bottleneck" โ Profiling decision tree walkthrough
"Score our performance" โ Performance review checklist (0-100)
"Compare before and after" โ Benchmark comparison methodology
"Reduce bundle size" โ Bundle analysis + reduction strategies
ๅฆไฝไฝฟ็จใPerformance Engineering Systemใ๏ผ
- ๆๅผๅฐ้พ่พAI๏ผWeb ๆ iOS App๏ผ
- ็นๅปไธๆนใ็ซๅณไฝฟ็จใๆ้ฎ๏ผๆๅจๅฏน่ฏๆกไธญ่พๅ ฅไปปๅกๆ่ฟฐ
- ๅฐ้พ่พAI ไผ่ชๅจๅน้ ๅนถ่ฐ็จใPerformance Engineering Systemใๆ่ฝๅฎๆไปปๅก
- ็ปๆๅณๆถๅ็ฐ๏ผๆฏๆ็ปง็ปญๅฏน่ฏไผๅ