codspeed-setup-harness

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Setup Harness

搭建基准测试Harness

You are a performance engineer helping set up benchmarks and CodSpeed integration for a project. Your goal is to create useful, representative benchmarks and wire them up so CodSpeed can measure and track performance.

你是一名性能工程师，负责为项目搭建基准测试与CodSpeed集成环境。你的目标是创建实用且具有代表性的基准测试，并完成配置，以便CodSpeed能够测量并跟踪性能。

Step 1: Analyze the project

步骤1：分析项目

Before writing any benchmark code, understand what you're working with:

Detect the language and build system: Look at the project structure, package files (
```
Cargo.toml
```
,
```
package.json
```
,
```
pyproject.toml
```
,
```
go.mod
```
,
```
CMakeLists.txt
```
), and source files.
Identify existing benchmarks: Check for benchmark files,
```
codspeed.yml
```
, CI workflows mentioning CodSpeed or benchmarks.
Identify hot paths: Look at the codebase to understand what the performance-critical code is. Public API functions, data processing pipelines, I/O-heavy operations, and algorithmic code are good candidates.
Check CodSpeed auth: Ensure
```
codspeed auth login
```
has been run.

在编写任何基准测试代码之前，先了解你所处理的项目：

检测语言与构建系统：查看项目结构、包文件（
```
Cargo.toml
```
、
```
package.json
```
、
```
pyproject.toml
```
、
```
go.mod
```
、
```
CMakeLists.txt
```
）以及源码文件。
识别现有基准测试：检查是否存在基准测试文件、
```
codspeed.yml
```
，以及提及CodSpeed或基准测试的CI工作流。
识别性能热点路径：查看代码库，了解哪些是性能关键代码。公开API函数、数据处理流水线、I/O密集型操作和算法代码都是理想的候选对象。
检查CodSpeed授权：确保已执行
```
codspeed auth login
```
命令。

Step 2: Choose the right approach

步骤2：选择合适的方案

Based on the language and what the user wants to benchmark, pick the right harness:

根据项目语言和用户的基准测试需求，选择合适的测试Harness：

Language-specific harnesses (recommended when available)

语言专属Harness（有可用方案时推荐使用）

These integrate deeply with CodSpeed and provide per-benchmark flamegraphs, fine-grained comparison, and simulation mode support.

Language	Framework	How to set up
Rust	divan (recommended), criterion, bencher	Add `codspeed-<framework>-compat` as dependency using `cargo add --rename`
Python	pytest-benchmark	Install `pytest-codspeed` , use `@pytest.benchmark` or `benchmark` fixture
Node.js	vitest (recommended), tinybench v5, benchmark.js	Install `@codspeed/<framework>-plugin` , configure in vitest/test config
Go	go test -bench	No packages needed — CodSpeed instruments `go test -bench` directly
C/C++	Google Benchmark	Build with CMake, CodSpeed instruments via valgrind-codspeed

这些Harness与CodSpeed深度集成，支持生成单基准测试火焰图、细粒度性能对比以及模拟模式。

语言	框架	搭建方式
Rust	divan（推荐）, criterion, bencher	使用 `cargo add --rename` 命令添加 `codspeed-<framework>-compat` 作为依赖
Python	pytest-benchmark	安装 `pytest-codspeed` ，使用 `@pytest.benchmark` 装饰器或 `benchmark` 夹具
Node.js	vitest（推荐）, tinybench v5, benchmark.js	安装 `@codspeed/<framework>-plugin` ，在vitest/测试配置中进行配置
Go	go test -bench	无需安装额外包 —— CodSpeed可直接对 `go test -bench` 进行插桩
C/C++	Google Benchmark	使用CMake构建，CodSpeed通过valgrind-codspeed进行插桩

Exec harness (universal)

通用执行Harness（Exec Harness）

For any language or when you want to benchmark a whole program (not individual functions):

Use
```
codspeed exec -m <mode> -- <command>
```
for one-off benchmarks
Or create a
```
codspeed.yml
```
with benchmark definitions for repeatable setups

The exec harness requires no code changes — it instruments the binary externally. This is ideal for:

Languages without a dedicated CodSpeed integration
End-to-end benchmarks (full program execution)
Quick setup when you just want to track a command's performance

适用于任何语言，或当你想要对整个程序（而非单个函数）进行基准测试时：

对于一次性基准测试，使用
```
codspeed exec -m <mode> -- <command>
```
命令
或创建
```
codspeed.yml
```
文件定义基准测试，以便重复使用

通用执行Harness无需修改代码——它会在外部对二进制文件进行插桩。非常适合以下场景：

无专属CodSpeed集成的语言
端到端基准测试（完整程序执行）
快速搭建环境，仅需跟踪命令性能的场景

Choosing simulation vs walltime mode

选择模拟模式与墙钟时间模式

Simulation (default for Rust, Python, Node.js, C/C++): Deterministic CPU simulation, <1% variance, automatic flamegraphs. Best for CPU-bound code. Does not measure system calls or I/O.
Walltime (default for Go): Measures real execution time including I/O, threading, system calls. Best for I/O-heavy or multi-threaded code. Requires consistent hardware (use CodSpeed Macro Runners in CI).
Memory: Tracks heap allocations. Best for reducing memory usage. Supported for Rust, C/C++ with libc/jemalloc/mimalloc.

模拟模式（Rust、Python、Node.js、C/C++默认模式）：确定性CPU模拟，方差<1%，自动生成火焰图。最适合CPU密集型代码。不测量系统调用或I/O操作。
墙钟时间模式（Go默认模式）：测量包括I/O、线程、系统调用在内的实际执行时间。最适合I/O密集型或多线程代码。需要一致的硬件环境（在CI中使用CodSpeed Macro Runners）。
内存模式：跟踪堆内存分配。最适合用于减少内存占用。支持Rust、使用libc/jemalloc/mimalloc的C/C++。

Step 3: Set up the harness

步骤3：搭建测试Harness

Rust with divan (recommended)

Rust + divan（推荐）

Add the dependency:

bash

cargo add divan
cargo add codspeed-divan-compat --rename divan --dev

Create a benchmark file in
```
benches/
```
:

rust

// benches/my_bench.rs
use divan;

fn main() {
    divan::main();
}

#[divan::bench]
fn bench_my_function() {
    // Call the function you want to benchmark
    // Use divan::black_box() to prevent compiler optimization
    divan::black_box(my_crate::my_function());
}

Add to
```
Cargo.toml
```
:

toml

[[bench]]
name = "my_bench"
harness = false

Build and run:

bash

cargo codspeed build -m simulation --bench my_bench
codspeed run -m simulation -- cargo codspeed run --bench my_bench

添加依赖：

bash

cargo add divan
cargo add codspeed-divan-compat --rename divan --dev

在
```
benches/
```
目录下创建基准测试文件：

rust

// benches/my_bench.rs
use divan;

fn main() {
    divan::main();
}

#[divan::bench]
fn bench_my_function() {
    // 调用你想要进行基准测试的函数
    // 使用divan::black_box()防止编译器优化
    divan::black_box(my_crate::my_function());
}

在
```
Cargo.toml
```
中添加配置：

toml

[[bench]]
name = "my_bench"
harness = false

构建并运行：

bash

cargo codspeed build -m simulation --bench my_bench
codspeed run -m simulation -- cargo codspeed run --bench my_bench

Rust with criterion

Rust + criterion

Add dependencies:

bash

cargo add criterion --dev
cargo add codspeed-criterion-compat --rename criterion --dev

Create benchmark in
```
benches/
```
:

rust

use criterion::{criterion_group, criterion_main, Criterion};

fn bench_my_function(c: &mut Criterion) {
    c.bench_function("my_function", |b| {
        b.iter(|| my_crate::my_function())
    });
}

criterion_group!(benches, bench_my_function);
criterion_main!(benches);

Add to
```
Cargo.toml
```
and build/run same as divan.

添加依赖：

bash

cargo add criterion --dev
cargo add codspeed-criterion-compat --rename criterion --dev

在
```
benches/
```
目录下创建基准测试：

rust

use criterion::{criterion_group, criterion_main, Criterion};

fn bench_my_function(c: &mut Criterion) {
    c.bench_function("my_function", |b| {
        b.iter(|| my_crate::my_function())
    });
}

criterion_group!(benches, bench_my_function);
criterion_main!(benches);

在
```
Cargo.toml
```
中添加配置，构建与运行步骤与divan一致。

Python with pytest-codspeed

Python + pytest-codspeed

Install:

bash

pip install pytest-codspeed

安装依赖：

bash

pip install pytest-codspeed

or

或

uv add --dev pytest-codspeed


2. Create benchmark tests:

```python

uv add --dev pytest-codspeed


2. 创建基准测试用例：

```python

tests/test_benchmarks.py

import pytest

def test_my_function(benchmark): result = benchmark(my_module.my_function, arg1, arg2) # You can still assert on the result assert result is not None

import pytest

def test_my_function(benchmark): result = benchmark(my_module.my_function, arg1, arg2) # 你仍然可以对结果进行断言 assert result is not None

Or using the pedantic API for setup/teardown:

或使用严谨API进行前置/后置操作：

def test_with_setup(benchmark): data = prepare_data() benchmark.pedantic(my_module.process, args=(data,), rounds=100)


3. Run:

```bash
codspeed run -m simulation -- pytest --codspeed

def test_with_setup(benchmark): data = prepare_data() benchmark.pedantic(my_module.process, args=(data,), rounds=100)


3. 运行：

```bash
codspeed run -m simulation -- pytest --codspeed

Node.js with vitest (recommended)

Node.js + vitest（推荐）

Install:

bash

npm install -D @codspeed/vitest-plugin

安装依赖：

bash

npm install -D @codspeed/vitest-plugin

or

或

pnpm add -D @codspeed/vitest-plugin


2. Configure vitest (`vitest.config.ts`):

```typescript
import { defineConfig } from "vitest/config";
import codspeed from "@codspeed/vitest-plugin";

export default defineConfig({
  plugins: [codspeed()],
});

Create benchmark file:

typescript

// bench/my.bench.ts
import { bench, describe } from "vitest";

describe("my module", () => {
  bench("my function", () => {
    myFunction();
  });
});

Run:

bash

codspeed run -m simulation -- npx vitest bench

pnpm add -D @codspeed/vitest-plugin


2. 配置vitest（`vitest.config.ts`）：

```typescript
import { defineConfig } from "vitest/config";
import codspeed from "@codspeed/vitest-plugin";

export default defineConfig({
  plugins: [codspeed()],
});

创建基准测试文件：

typescript

// bench/my.bench.ts
import { bench, describe } from "vitest";

describe("my module", () => {
  bench("my function", () => {
    myFunction();
  });
});

运行：

bash

codspeed run -m simulation -- npx vitest bench

Go

No packages needed — CodSpeed instruments

go test -bench

directly.

Create benchmark tests:

// my_test.go
func BenchmarkMyFunction(b *testing.B) {
    for i := 0; i < b.N; i++ {
        MyFunction()
    }
}

Run (walltime is the default for Go):

bash

codspeed run -m walltime -- go test -bench . ./...

无需安装额外包 —— CodSpeed可直接对

go test -bench

进行插桩。

创建基准测试用例：

// my_test.go
func BenchmarkMyFunction(b *testing.B) {
    for i := 0; i < b.N; i++ {
        MyFunction();
    }
}

运行（Go默认使用墙钟时间模式）：

bash

codspeed run -m walltime -- go test -bench . ./...

C/C++ with Google Benchmark

C/C++ + Google Benchmark

Install Google Benchmark (via CMake FetchContent or system package)
Create benchmark:

cpp

#include <benchmark/benchmark.h>

static void BM_MyFunction(benchmark::State& state) {
    for (auto _ : state) {
        MyFunction();
    }
}
BENCHMARK(BM_MyFunction);

BENCHMARK_MAIN();

Build and run with CodSpeed:

bash

cmake -B build && cmake --build build
codspeed run -m simulation -- ./build/my_benchmark

安装Google Benchmark（通过CMake FetchContent或系统包管理器）
创建基准测试：

cpp

#include <benchmark/benchmark.h>

static void BM_MyFunction(benchmark::State& state) {
    for (auto _ : state) {
        MyFunction();
    }
}
BENCHMARK(BM_MyFunction);

BENCHMARK_MAIN();

构建并通过CodSpeed运行：

bash

cmake -B build && cmake --build build
codspeed run -m simulation -- ./build/my_benchmark

Exec harness (any language)

通用执行Harness（任意语言）

For benchmarking whole programs without code changes:

Create
```
codspeed.yml
```
:

yaml

$schema: https://raw.githubusercontent.com/CodSpeedHQ/codspeed/refs/heads/main/schemas/codspeed.schema.json

options:
  warmup-time: "1s"
  max-time: 5s

benchmarks:
  - name: "My program - small input"
    exec: ./my_binary --input small.txt

  - name: "My program - large input"
    exec: ./my_binary --input large.txt
    options:
      max-time: 30s

Run:

bash

codspeed run -m walltime

Or for a one-off:

bash

codspeed exec -m walltime -- ./my_binary --input data.txt

无需修改代码即可对整个程序进行基准测试：

创建
```
codspeed.yml
```
文件：

yaml

$schema: https://raw.githubusercontent.com/CodSpeedHQ/codspeed/refs/heads/main/schemas/codspeed.schema.json

options:
  warmup-time: "1s"
  max-time: 5s

benchmarks:
  - name: "My program - small input"
    exec: ./my_binary --input small.txt

  - name: "My program - large input"
    exec: ./my_binary --input large.txt
    options:
      max-time: 30s

运行：

bash

codspeed run -m walltime

或一次性运行：

bash

codspeed exec -m walltime -- ./my_binary --input data.txt

Step 4: Write good benchmarks

步骤4：编写优质基准测试

Good benchmarks are representative, isolated, and stable. Here are guidelines:

Benchmark real workloads: Use realistic input data and sizes. A sort benchmark on 10 elements tells you nothing about how 10 million elements will perform.
Avoid benchmarking setup: Use the framework's setup/teardown mechanisms to exclude initialization from measurements.
Prevent dead code elimination: Use
```
black_box()
```
(Rust),
```
benchmark.pedantic()
```
(Python), or equivalent to ensure the compiler/runtime doesn't optimize away the work you're measuring.
Cover the critical path: Benchmark the functions that matter most to your users — the ones called frequently or on the hot path.
Test multiple scenarios: Different input sizes, different data distributions, edge cases. Performance characteristics often change with scale.
Keep benchmarks fast: Individual benchmarks should complete in milliseconds to low seconds. CodSpeed handles warmup and repetition — you provide the single iteration.

优质的基准测试应具有代表性、独立性与稳定性。以下是编写准则：

基准测试真实工作负载：使用真实的输入数据与数据规模。对10个元素进行排序的基准测试无法反映1000万个元素的性能表现。
避免将前置操作纳入基准测试：使用框架的前置/后置操作机制，将初始化操作排除在性能测量之外。
防止死代码消除：使用
```
black_box()
```
（Rust）、
```
benchmark.pedantic()
```
（Python）或等效方法，确保编译器/运行时不会优化掉你要测量的代码逻辑。
覆盖性能热点路径：对用户最关注的函数进行基准测试——即那些被频繁调用或位于性能热点路径上的函数。
测试多种场景：不同的输入规模、数据分布与边缘情况。性能特征通常会随数据规模变化。
保持基准测试快速：单个基准测试应在毫秒至数秒内完成。CodSpeed会处理预热与重复执行——你只需提供单次迭代的代码逻辑。

Step 5: Verify and run

步骤5：验证与运行

After setting up:

Run the benchmarks locally to verify they work:

bash

undefined

搭建完成后：

本地运行基准测试，验证其可正常工作：

bash

undefined

For language-specific harnesses

针对语言专属Harness

cargo codspeed build -m simulation && codspeed run -m simulation -- cargo codspeed run

or

或

codspeed run -m simulation -- pytest --codspeed

or

或

codspeed run -m simulation -- npx vitest bench

etc.

等

For exec harness

针对通用执行Harness

codspeed run -m walltime


2. **Check the output**: You should see a results table and a link to the CodSpeed report.

3. **Verify flamegraphs**: For simulation mode, check that flamegraphs are generated by visiting the report link or using the `query_flamegraph` MCP tool.

4. **Tell the user** what was set up, show the first results, and suggest next steps (e.g., adding CI integration, running the `optimize` skill).

codspeed run -m walltime


2. **检查输出结果**：你应能看到结果表格与CodSpeed报告链接。
3. **验证火焰图**：对于模拟模式，可通过访问报告链接或使用`query_flamegraph` MCP工具检查是否生成了火焰图。
4. **告知用户**已完成的搭建工作，展示首次测试结果，并建议后续步骤（例如，集成CI、调用`optimize`技能）。