performance-engineering

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Performance Engineering

性能工程

Purpose

用途

Performance engineering encompasses load testing, profiling, and optimization to deliver reliable, scalable systems. This skill provides frameworks for choosing the right performance testing approach (load, stress, soak, spike), profiling techniques to identify bottlenecks (CPU, memory, I/O), and optimization strategies for backend APIs, databases, and frontend applications.
Use this skill to validate system capacity before launch, detect performance regressions in CI/CD pipelines, identify and resolve bottlenecks through profiling, and optimize application responsiveness across the stack.
性能工程涵盖负载测试、性能剖析和优化,旨在构建可靠、可扩展的系统。本技能提供了选择合适性能测试方法(负载测试、压力测试、浸泡测试、尖峰测试)的框架、识别瓶颈的性能剖析技术(CPU、内存、I/O),以及针对后端API、数据库和前端应用的优化策略。
使用本技能可在上线前验证系统容量、在CI/CD流水线中检测性能回归、通过性能剖析识别并解决瓶颈,以及优化全栈应用的响应速度。

When to Use This Skill

何时使用本技能

Common Triggers:
  • "Validate API can handle expected traffic"
  • "Find maximum capacity and breaking points"
  • "Identify why the application is slow"
  • "Detect memory leaks or resource exhaustion"
  • "Optimize Core Web Vitals for SEO"
  • "Set up performance testing in CI/CD"
  • "Reduce cloud infrastructure costs"
Use Cases:
  • Pre-launch capacity planning and load validation
  • Post-refactor performance regression testing
  • Investigating slow response times or high latency
  • Detecting memory leaks in long-running services
  • Optimizing database query performance
  • Validating auto-scaling configuration
  • Establishing performance SLOs and budgets
常见触发场景:
  • "验证API能否处理预期流量"
  • "找出最大容量和断点"
  • "排查应用运行缓慢的原因"
  • "检测内存泄漏或资源耗尽问题"
  • "优化Core Web Vitals以提升SEO"
  • "在CI/CD中设置性能测试"
  • "降低云基础设施成本"
适用场景:
  • 上线前的容量规划与负载验证
  • 重构后的性能回归测试
  • 排查响应缓慢或高延迟问题
  • 检测长期运行服务中的内存泄漏
  • 优化数据库查询性能
  • 验证自动扩缩容配置
  • 建立性能SLO和性能预算

Performance Testing Types

性能测试类型

Load Testing

负载测试

Validate system behavior under expected traffic levels.
When to use: Pre-launch capacity planning, regression testing after refactors, validating auto-scaling.
验证系统在预期流量下的表现。
适用场景: 上线前的容量规划、重构后的回归测试、验证自动扩缩容。

Stress Testing

压力测试

Find system capacity limits and failure modes.
When to use: Capacity planning, understanding failure behavior, infrastructure sizing decisions.
找出系统的容量极限和故障模式。
适用场景: 容量规划、了解故障行为、基础设施规模决策。

Soak Testing

浸泡测试

Identify memory leaks, resource exhaustion, and degradation over time.
When to use: Detecting memory leaks, validating connection pool cleanup, testing long-running batch jobs.
识别内存泄漏、资源耗尽以及随时间推移的性能退化问题。
适用场景: 检测内存泄漏、验证连接池清理机制、测试长期运行的批处理任务。

Spike Testing

尖峰测试

Validate system response to sudden traffic spikes.
When to use: Validating auto-scaling, testing event-driven systems (product launches), ensuring rate limiting works.
验证系统对突发流量尖峰的响应能力。
适用场景: 验证自动扩缩容、测试事件驱动系统(如产品发布)、确保限流机制有效。

Quick Decision Framework

快速决策框架

Which test type to use?
What am I trying to learn?
├─ Can my system handle expected traffic? → LOAD TEST
├─ What's the maximum capacity? → STRESS TEST
├─ Will it stay stable over time? → SOAK TEST
└─ Can it handle traffic spikes? → SPIKE TEST
For detailed testing patterns, load scenarios, and interpreting results, see
references/testing-types.md
.
选择哪种测试类型?
我想要了解什么?
├─ 我的系统能否处理预期流量? → 负载测试
├─ 系统的最大容量是多少? → 压力测试
├─ 系统能否长期保持稳定? → 浸泡测试
└─ 系统能否处理流量尖峰? → 尖峰测试
如需详细的测试模式、负载场景和结果解读,请查看
references/testing-types.md

Load Testing Quick Starts

负载测试快速入门

k6 (JavaScript)

k6(JavaScript)

Installation:
bash
brew install k6  # macOS
sudo apt-get install k6  # Linux
Basic Load Test:
javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 20 },
    { duration: '1m', target: 20 },
    { duration: '30s', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  sleep(1);
}
Run:
k6 run script.js
For stress, soak, and spike testing examples, see
examples/k6/
.
安装:
bash
brew install k6  # macOS
sudo apt-get install k6  # Linux
基础负载测试脚本:
javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 20 },
    { duration: '1m', target: 20 },
    { duration: '30s', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  sleep(1);
}
运行:
k6 run script.js
压力测试、浸泡测试和尖峰测试的示例,请查看
examples/k6/

Locust (Python)

Locust(Python)

Installation:
bash
pip install locust
Basic Load Test:
python
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)
    host = "https://api.example.com"

    @task(3)
    def view_products(self):
        self.client.get("/products")

    @task(1)
    def view_product_detail(self):
        self.client.get("/products/123")
Run:
locust -f locustfile.py --headless -u 100 -r 10 --run-time 10m
For REST API testing and data-driven testing, see
examples/locust/
.
安装:
bash
pip install locust
基础负载测试脚本:
python
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)
    host = "https://api.example.com"

    @task(3)
    def view_products(self):
        self.client.get("/products")

    @task(1)
    def view_product_detail(self):
        self.client.get("/products/123")
运行:
locust -f locustfile.py --headless -u 100 -r 10 --run-time 10m
REST API测试和数据驱动测试的示例,请查看
examples/locust/

Profiling Quick Starts

性能剖析快速入门

When to Profile

何时进行性能剖析

SymptomProfiling TypeTool
High CPU (>70%)CPU Profilingpy-spy, pprof, DevTools
Memory growingMemory Profilingmemory_profiler, pprof heap
Slow response, low CPUI/O ProfilingQuery logs, pprof block
症状剖析类型工具
CPU使用率过高(>70%)CPU剖析py-spy, pprof, DevTools
内存持续增长内存剖析memory_profiler, pprof heap
响应缓慢但CPU使用率低I/O剖析查询日志, pprof block

Python Profiling

Python性能剖析

py-spy (Production-Safe):
bash
pip install py-spy
py-spy(生产环境安全):
bash
pip install py-spy

Profile running process

剖析运行中的进程

py-spy record -o profile.svg --pid <PID> --duration 30
py-spy record -o profile.svg --pid <PID> --duration 30

Top-like view

类似top的实时视图

py-spy top --pid <PID>

**Memory Profiling:**
```python
from memory_profiler import profile

@profile
def my_function():
    a = [1] * (10 ** 6)
    return a
py-spy top --pid <PID>

**内存剖析:**
```python
from memory_profiler import profile

@profile
def my_function():
    a = [1] * (10 ** 6)
    return a

Run: python -m memory_profiler script.py

运行:python -m memory_profiler script.py

undefined
undefined

Go Profiling

Go性能剖析

pprof (Built-in):
go
import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    startApp()
}
Capture profile:
bash
undefined
pprof(内置工具):
go
import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    startApp()
}
捕获剖析数据:
bash
undefined

CPU profile (30 seconds)

CPU剖析(持续30秒)

Interactive analysis

交互式分析

(pprof) top (pprof) web
undefined
(pprof) top (pprof) web
undefined

TypeScript/JavaScript Profiling

TypeScript/JavaScript性能剖析

Chrome DevTools (Browser/Node.js):
Node.js:
bash
node --inspect app.js
Chrome DevTools(浏览器/Node.js):
Node.js:
bash
node --inspect app.js

Open chrome://inspect

打开 chrome://inspect

Performance tab → Record

性能标签页 → 开始录制


**clinic.js (Node.js):**
```bash
npm install -g clinic
clinic doctor -- node app.js
For detailed profiling workflows and analysis, see
references/profiling-guide.md
and
examples/profiling/
.

**clinic.js(Node.js):**
```bash
npm install -g clinic
clinic doctor -- node app.js
如需详细的剖析流程和分析方法,请查看
references/profiling-guide.md
examples/profiling/

Optimization Strategies

优化策略

Caching

缓存

When to cache:
  • Data queried frequently (>100 req/min)
  • Data freshness tolerance (>1 minute acceptable staleness)
Redis example:
python
import redis
r = redis.Redis()

def get_cached_data(key, fn, ttl=300):
    cached = r.get(key)
    if cached:
        return json.loads(cached)
    data = fn()
    r.setex(key, ttl, json.dumps(data))
    return data
何时使用缓存:
  • 数据查询频率高(>100次/分钟)
  • 可接受数据存在一定延迟(>1分钟的过期时间)
Redis示例:
python
import redis
r = redis.Redis()

def get_cached_data(key, fn, ttl=300):
    cached = r.get(key)
    if cached:
        return json.loads(cached)
    data = fn()
    r.setex(key, ttl, json.dumps(data))
    return data

Database Query Optimization

数据库查询优化

N+1 prevention:
python
undefined
避免N+1查询:
python
undefined

Bad: N+1 queries

不良写法:N+1次查询

users = User.query.all() for user in users: print(user.orders) # Separate query per user
users = User.query.all() for user in users: print(user.orders) # 每个用户触发一次单独查询

Good: Eager loading

推荐写法:预加载

users = User.query.options(joinedload(User.orders)).all()

**Indexing:**
```sql
CREATE INDEX idx_users_email ON users(email);
users = User.query.options(joinedload(User.orders)).all()

**索引优化:**
```sql
CREATE INDEX idx_users_email ON users(email);

API Performance

API性能优化

Cursor-based pagination:
typescript
app.get('/api/products', async (req, res) => {
  const { cursor, limit = 20 } = req.query;

  const products = await db.query(
    'SELECT * FROM products WHERE id > ? ORDER BY id LIMIT ?',
    [cursor || 0, limit]
  );

  res.json({
    data: products,
    next_cursor: products[products.length - 1]?.id,
  });
});
基于游标分页:
typescript
app.get('/api/products', async (req, res) => {
  const { cursor, limit = 20 } = req.query;

  const products = await db.query(
    'SELECT * FROM products WHERE id > ? ORDER BY id LIMIT ?',
    [cursor || 0, limit]
  );

  res.json({
    data: products,
    next_cursor: products[products.length - 1]?.id,
  });
});

Frontend Performance (Core Web Vitals)

前端性能优化(Core Web Vitals)

Key metrics:
  • LCP (Largest Contentful Paint): < 2.5s
  • INP (Interaction to Next Paint): < 200ms
  • CLS (Cumulative Layout Shift): < 0.1
Optimization techniques:
  • Code splitting (lazy loading)
  • Image optimization (WebP, responsive, lazy loading)
  • Preload critical resources
  • Minimize render-blocking resources
For detailed optimization strategies, see
references/optimization-strategies.md
and
references/frontend-performance.md
.
核心指标:
  • LCP(最大内容绘制): < 2.5秒
  • INP(交互到下一次绘制): < 200毫秒
  • CLS(累积布局偏移): < 0.1
优化技巧:
  • 代码分割(懒加载)
  • 图片优化(WebP格式、响应式图片、懒加载)
  • 预加载关键资源
  • 最小化阻塞渲染的资源
如需详细的优化策略,请查看
references/optimization-strategies.md
references/frontend-performance.md

Performance SLOs

性能SLO

Recommended SLOs by Service Type

各服务类型推荐SLO

Service Typep95 Latencyp99 LatencyAvailability
User-Facing API< 200ms< 500ms99.9%
Internal API< 100ms< 300ms99.5%
Database Query< 50ms< 100ms99.99%
Background Job< 5s< 10s99%
Real-time API< 50ms< 100ms99.95%
服务类型p95延迟p99延迟可用性
用户面向API< 200ms< 500ms99.9%
内部API< 100ms< 300ms99.5%
数据库查询< 50ms< 100ms99.99%
后台任务< 5s< 10s99%
实时API< 50ms< 100ms99.95%

SLO Selection Process

SLO选择流程

  1. Measure baseline performance
  2. Identify user expectations
  3. Set achievable targets (10-20% better than baseline)
  4. Iterate as system matures
For detailed SLO framework and performance budgets, see
references/slo-framework.md
.
  1. 测量基准性能
  2. 明确用户期望
  3. 设置可实现的目标(比基准好10-20%)
  4. 随着系统成熟逐步迭代
如需详细的SLO框架和性能预算,请查看
references/slo-framework.md

CI/CD Integration

CI/CD集成

Performance Testing in Pipelines

流水线中的性能测试

GitHub Actions example:
yaml
name: Performance Tests

on:
  pull_request:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install k6
        run: |
          curl https://github.com/grafana/k6/releases/download/v0.48.0/k6-v0.48.0-linux-amd64.tar.gz -L | tar xvz
          sudo mv k6-v0.48.0-linux-amd64/k6 /usr/local/bin/

      - name: Run load test
        run: k6 run tests/load/api-test.js
Performance budgets:
javascript
// k6 test with thresholds (fail build if violated)
export const options = {
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};
GitHub Actions示例:
yaml
name: Performance Tests

on:
  pull_request:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install k6
        run: |
          curl https://github.com/grafana/k6/releases/download/v0.48.0/k6-v0.48.0-linux-amd64.tar.gz -L | tar xvz
          sudo mv k6-v0.48.0-linux-amd64/k6 /usr/local/bin/

      - name: Run load test
        run: k6 run tests/load/api-test.js
性能预算:
javascript
// 带有阈值的k6测试(若违反则构建失败)
export const options = {
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

Profiling Workflow

性能剖析流程

Standard process:
  1. Observe symptoms (high CPU, memory growth, slow response)
  2. Hypothesize bottleneck (CPU? Memory? I/O?)
  3. Choose profiling type based on hypothesis
  4. Run profiler under realistic load
  5. Analyze profile (flamegraph, call tree)
  6. Identify hot spots (top 20% functions using 80% resources)
  7. Optimize bottlenecks
  8. Re-profile to validate improvement
Best practices:
  • Profile under realistic load (not idle systems)
  • Use sampling profilers (py-spy, pprof) in production (low overhead)
  • Focus on hot paths (optimize biggest bottlenecks first)
  • Validate optimizations with before/after comparisons
标准流程:
  1. 观察症状(CPU过高、内存增长、响应缓慢)
  2. 假设瓶颈所在(CPU?内存?I/O?)
  3. 根据假设选择剖析类型
  4. 在真实负载下运行剖析工具
  5. 分析剖析数据(火焰图、调用树)
  6. 识别热点(占用80%资源的20%函数)
  7. 优化瓶颈
  8. 重新剖析以验证优化效果
最佳实践:
  • 在真实负载下进行剖析(而非空闲系统)
  • 在生产环境中使用采样剖析工具(py-spy、pprof),开销低
  • 聚焦关键路径(优先优化最大的瓶颈)
  • 通过前后对比验证优化效果

Tool Recommendations

工具推荐

Load Testing

负载测试

Primary: k6 (JavaScript-based, Grafana-backed)
  • Modern architecture, cloud-native
  • JavaScript DSL (ES6+)
  • Grafana/Prometheus integration
  • Multi-protocol (HTTP/1.1, HTTP/2, WebSocket, gRPC)
When to use: Modern APIs, microservices, CI/CD integration.
Alternative: Locust (Python-based)
  • Python-native (write tests in Python)
  • Web UI for real-time monitoring
  • Flexible for complex user scenarios
When to use: Python-heavy teams, complex user flows.
首选:k6(基于JavaScript,Grafana支持)
  • 现代化架构,云原生
  • JavaScript DSL(支持ES6+)
  • 集成Grafana/Prometheus
  • 多协议支持(HTTP/1.1、HTTP/2、WebSocket、gRPC)
适用场景: 现代化API、微服务、CI/CD集成。
替代工具:Locust(基于Python)
  • 原生Python支持(用Python编写测试)
  • 提供Web UI进行实时监控
  • 灵活应对复杂用户场景
适用场景: 以Python为主的团队、复杂用户流程。

Profiling

性能剖析

Python:
  • py-spy (sampling, production-safe)
  • cProfile (deterministic, detailed)
  • memory_profiler (memory leak detection)
Go:
  • pprof (built-in, CPU/heap/goroutine/block profiling)
TypeScript/JavaScript:
  • Chrome DevTools (browser/Node.js)
  • clinic.js (Node.js performance suite)
For detailed tool comparisons, see
references/testing-types.md
and
references/profiling-guide.md
.
Python:
  • py-spy(采样式,生产环境安全)
  • cProfile(确定性,详细)
  • memory_profiler(内存泄漏检测)
Go:
  • pprof(内置工具,支持CPU/堆/协程/阻塞剖析)
TypeScript/JavaScript:
  • Chrome DevTools(浏览器/Node.js)
  • clinic.js(Node.js性能套件)
如需详细的工具对比,请查看
references/testing-types.md
references/profiling-guide.md

Reference Documentation

参考文档

Detailed Guides:
  • references/testing-types.md
    - Load, stress, soak, spike testing patterns
  • references/profiling-guide.md
    - CPU, memory, I/O profiling across languages
  • references/optimization-strategies.md
    - Caching, database, API optimization
  • references/frontend-performance.md
    - Core Web Vitals, bundle optimization
  • references/slo-framework.md
    - Setting SLOs, performance budgets
  • references/benchmarking.md
    - Benchmarking best practices
Examples:
  • examples/k6/
    - Load, stress, soak, spike tests
  • examples/locust/
    - Python-based load testing
  • examples/profiling/
    - Profiling examples (Python, Go, TypeScript)
  • examples/optimization/
    - Caching, query, API optimization
详细指南:
  • references/testing-types.md
    - 负载、压力、浸泡、尖峰测试模式
  • references/profiling-guide.md
    - 多语言CPU、内存、I/O性能剖析
  • references/optimization-strategies.md
    - 缓存、数据库、API优化
  • references/frontend-performance.md
    - Core Web Vitals、包体积优化
  • references/slo-framework.md
    - SLO设置、性能预算
  • references/benchmarking.md
    - 基准测试最佳实践
示例:
  • examples/k6/
    - 负载、压力、浸泡、尖峰测试示例
  • examples/locust/
    - 基于Python的负载测试示例
  • examples/profiling/
    - 性能剖析示例(Python、Go、TypeScript)
  • examples/optimization/
    - 缓存、查询、API优化示例

Related Skills

相关技能

For comprehensive testing strategies, see the
testing-strategies
skill.
For CI/CD integration patterns, see the
building-ci-pipelines
skill.
For infrastructure sizing based on load tests, see the
infrastructure-as-code
skill.
For Kubernetes performance testing, see the
kubernetes-operations
skill.
如需全面的测试策略,请查看
testing-strategies
技能。
如需CI/CD集成模式,请查看
building-ci-pipelines
技能。
如需基于负载测试的基础设施 sizing,请查看
infrastructure-as-code
技能。
如需Kubernetes性能测试,请查看
kubernetes-operations
技能。