golang-benchmark

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.
Thinking mode: Use
ultrathink
for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.
角色定位: 你是一名Go性能度量工程师。在做出任何优化决策前,绝不会仅凭单次基准测试结果得出结论——统计严谨性和受控环境是必要前提。
思考模式: 在进行基准测试分析、性能报告解读和性能对比任务时,使用
ultrathink
。深度推理可避免误读性能剖析数据,确保结论具备统计可靠性。

Go Benchmarking & Performance Measurement

Go基准测试与性能度量

Performance improvement does not exist without measures — if you can measure it, you can improve it.
This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See
samber/cc-skills-golang@golang-performance
skill. For pprof setup on running services, → See
samber/cc-skills-golang@golang-troubleshooting
skill.
没有度量就没有性能提升——如果你能度量它,就能优化它。
本技能覆盖完整的度量流程:编写基准测试、运行测试、分析结果、通过统计严谨性对比前后版本、在CI中追踪性能回归。若需在度量后应用优化方案,请查看
samber/cc-skills-golang@golang-performance
技能。若需在运行中服务上配置pprof,请查看
samber/cc-skills-golang@golang-troubleshooting
技能。

Writing Benchmarks

编写基准测试

b.Loop()
(Go 1.24+) — preferred

b.Loop()
(Go 1.24+)—— 推荐用法

b.Loop()
prevents the compiler from optimizing away the code under test — without it, the compiler can detect dead results and eliminate them, producing misleadingly fast numbers. It also excludes setup code before the loop from timing automatically.
go
func BenchmarkParse(b *testing.B) {
    data := loadFixture("large.json") // setup — excluded from timing
    for b.Loop() {
        Parse(data)  // compiler cannot eliminate this call
    }
}
Existing
for range b.N
benchmarks still work but should migrate to
b.Loop()
— the old pattern requires manual
b.ResetTimer()
and a package-level sink variable to prevent dead code elimination.
b.Loop()
可防止编译器优化掉被测代码——如果没有它,编译器会检测到无结果的代码并将其消除,产生误导性的快速执行数值。它还会自动将循环前的初始化代码排除在计时之外。
go
func BenchmarkParse(b *testing.B) {
    data := loadFixture("large.json") // 初始化代码——不计入计时
    for b.Loop() {
        Parse(data)  // 编译器无法消除此调用
    }
}
现有的
for range b.N
基准测试仍可运行,但应迁移至
b.Loop()
——旧模式需要手动调用
b.ResetTimer()
,并使用包级别的 sink 变量来防止死代码被优化消除。

Memory tracking

内存追踪

go
func BenchmarkAlloc(b *testing.B) {
    b.ReportAllocs() // or run with -benchmem flag
    for b.Loop() {
        _ = make([]byte, 1024)
    }
}
b.ReportMetric()
adds custom metrics (e.g., throughput):
go
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")
go
func BenchmarkAlloc(b *testing.B) {
    b.ReportAllocs() // 或使用 -benchmem 标志运行
    for b.Loop() {
        _ = make([]byte, 1024)
    }
}
b.ReportMetric()
可添加自定义指标(如吞吐量):
go
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")

Sub-benchmarks and table-driven

子基准测试与表驱动测试

go
func BenchmarkEncode(b *testing.B) {
    for _, size := range []int{64, 256, 4096} {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            data := make([]byte, size)
            for b.Loop() {
                Encode(data)
            }
        })
    }
}
go
func BenchmarkEncode(b *testing.B) {
    for _, size := range []int{64, 256, 4096} {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            data := make([]byte, size)
            for b.Loop() {
                Encode(data)
            }
        })
    }
}

Running Benchmarks

运行基准测试

bash
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
FlagPurpose
-bench=.
Run all benchmarks (regexp filter)
-benchmem
Report allocations (B/op, allocs/op)
-count=10
Run 10 times for statistical significance
-benchtime=3s
Minimum time per benchmark (default 1s)
-cpu=1,2,4
Run with different GOMAXPROCS values
-cpuprofile=cpu.prof
Write CPU profile
-memprofile=mem.prof
Write memory profile
-trace=trace.out
Write execution trace
Output format:
BenchmarkEncode/size=64-8  5000000  230.5 ns/op  128 B/op  2 allocs/op
— the
-8
suffix is GOMAXPROCS,
ns/op
is time per operation,
B/op
is bytes allocated per op,
allocs/op
is heap allocation count per op.
bash
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
标志位用途
-bench=.
运行所有基准测试(支持正则过滤)
-benchmem
报告内存分配情况(B/op, allocs/op)
-count=10
运行10次以保证统计显著性
-benchtime=3s
单次基准测试的最短运行时间(默认1秒)
-cpu=1,2,4
使用不同的GOMAXPROCS值运行
-cpuprofile=cpu.prof
生成CPU性能报告
-memprofile=mem.prof
生成内存性能报告
-trace=trace.out
生成执行追踪文件
输出格式:
BenchmarkEncode/size=64-8  5000000  230.5 ns/op  128 B/op  2 allocs/op
—— 后缀
-8
代表GOMAXPROCS值,
ns/op
表示每次操作的耗时,
B/op
表示每次操作分配的字节数,
allocs/op
表示每次操作的堆内存分配次数。

Profiling from Benchmarks

从基准测试生成性能报告

Generate profiles directly from benchmark runs — no HTTP server needed:
bash
undefined
直接从基准测试运行中生成性能报告——无需HTTP服务器:
bash
undefined

CPU profile

CPU性能报告

go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser go tool pprof cpu.prof
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser go tool pprof cpu.prof

Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)

内存性能报告(alloc_objects显示GC波动,inuse_space显示内存泄漏)

go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser go tool pprof -alloc_objects mem.prof
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser go tool pprof -alloc_objects mem.prof

Execution trace

执行追踪

go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser go tool trace trace.out

For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see [pprof Reference](./references/pprof.md). For execution trace interpretation, see [Trace Reference](./references/trace.md). For statistical comparison, see [benchstat Reference](./references/benchstat.md).
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser go tool trace trace.out

关于完整的pprof CLI参考(所有命令、非交互模式、性能报告解读),请查看[pprof参考文档](./references/pprof.md)。关于执行追踪解读,请查看[追踪参考文档](./references/trace.md)。关于统计对比,请查看[benchstat参考文档](./references/benchstat.md)。

Reference Files

参考文档

  • pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs allocobjects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into _where time and memory are being spent in your code.
  • benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.
  • Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where CPU goes) isn't enough — you need to see the timeline of what happened.
  • Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.
  • Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.
  • CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.
  • Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.
  • Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by
    prometheus/client_golang
    . Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between
    runtime/metrics
    (Go internal data) and Prometheus metrics (what you scrape from
    /metrics
    ). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.
  • pprof参考文档 —— 对CPU、内存和goroutine性能报告进行交互式和非交互式分析。包含完整的CLI命令、报告类型(CPU vs alloc_objects vs inuse_space)、Web UI导航和解读模式。用于深入分析代码中时间和内存的消耗位置。
  • benchstat参考文档 —— 使用严谨的置信区间和p值测试对基准测试运行结果进行统计对比。涵盖输出解读、过滤旧基准测试结果、交错结果以提升视觉清晰度,以及回归检测。当你需要证明某项变更确实带来了有意义的性能提升(而非单次幸运运行的结果)时使用。
  • 追踪参考文档 —— 执行追踪工具,用于理解代码运行的时间点和原因。可视化goroutine调度、垃圾回收阶段、网络阻塞和自定义跨度注解。当pprof(仅显示CPU消耗位置)不足以满足需求时使用——你需要查看事件的时间线。
  • 诊断工具 —— 辅助工具快速参考:fieldalignment(结构体填充浪费检测)、GODEBUG(运行时日志标志位)、fgprof(调用栈图性能报告)、竞态检测器(并发 bug 检测)等。当你有特定症状需要针对性诊断时使用——如果已有更简单的工具能解答问题,就不必使用pprof。
  • 编译器分析 —— 底层编译器优化洞察:逃逸分析(值何时转移到堆内存)、内联决策(哪些函数调用会被消除)、SSA转储(中间表示)和汇编输出。当基准测试显示出意外的内存分配,或你想验证编译器是否按预期执行优化时使用。
  • CI回归检测 —— CI流水线中的自动化性能回归拦截。涵盖三款工具(benchdiff用于快速PR对比,cob用于严格的阈值拦截,gobenchdata用于长期趋势仪表盘)、“噪声邻居”缓解策略(为何即使在空闲机器上,云CI基准测试仍会有5-10%的波动),以及自托管运行器调优以确保基准测试可复现。当你希望确保PR不会悄悄拖慢代码库时使用——尽早检测回归可避免引入性能债务。
  • 排查会话 —— 结合Prometheus运行时指标(堆内存大小、GC频率、goroutine数量)、PromQL查询以关联指标与代码变更、运行时配置标志位(GODEBUG环境变量以启用GC日志)和成本警告(当你遭遇性能损耗时)的生产环境性能排查流程。当基准测试结果良好,但真实流量下表现异常时使用。
  • Prometheus Go指标参考文档 ——
    prometheus/client_golang
    实际暴露为Prometheus指标的Go运行时指标完整列表。涵盖30个默认指标、40+个可选指标(Go 1.17+)、进程指标和常见PromQL查询。区分
    runtime/metrics
    (Go内部数据)和Prometheus指标(从
    /metrics
    端点抓取的数据)。当你搭建监控仪表盘或编写生产环境告警的PromQL查询时使用。

Cross-References

交叉引用

  • → See
    samber/cc-skills-golang@golang-performance
    skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y")
  • → See
    samber/cc-skills-golang@golang-troubleshooting
    skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodology
  • → See
    samber/cc-skills-golang@golang-observability
    skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)
  • → See
    samber/cc-skills-golang@golang-testing
    skill for general testing practices
  • → 若需在度量后应用优化方案(“若存在X瓶颈,则应用Y方案”),请查看
    samber/cc-skills-golang@golang-performance
    技能
  • → 若需在运行中服务上配置pprof(启用、安全设置、捕获)、使用Delve调试器、GODEBUG标志位和根因分析方法论,请查看
    samber/cc-skills-golang@golang-troubleshooting
    技能
  • → 若需日常全时段监控、持续性能剖析(Pyroscope)、分布式追踪(OpenTelemetry),请查看
    samber/cc-skills-golang@golang-observability
    技能
  • → 若需通用测试实践,请查看
    samber/cc-skills-golang@golang-testing
    技能