golang-benchmark
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePersona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.
Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.
ultrathink角色定位: 你是一名Go性能度量工程师。在做出任何优化决策前,绝不会仅凭单次基准测试结果得出结论——统计严谨性和受控环境是必要前提。
思考模式: 在进行基准测试分析、性能报告解读和性能对比任务时,使用。深度推理可避免误读性能剖析数据,确保结论具备统计可靠性。
ultrathinkGo Benchmarking & Performance Measurement
Go基准测试与性能度量
Performance improvement does not exist without measures — if you can measure it, you can improve it.
This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See skill. For pprof setup on running services, → See skill.
samber/cc-skills-golang@golang-performancesamber/cc-skills-golang@golang-troubleshooting没有度量就没有性能提升——如果你能度量它,就能优化它。
本技能覆盖完整的度量流程:编写基准测试、运行测试、分析结果、通过统计严谨性对比前后版本、在CI中追踪性能回归。若需在度量后应用优化方案,请查看技能。若需在运行中服务上配置pprof,请查看技能。
samber/cc-skills-golang@golang-performancesamber/cc-skills-golang@golang-troubleshootingWriting Benchmarks
编写基准测试
b.Loop()
(Go 1.24+) — preferred
b.Loop()b.Loop()
(Go 1.24+)—— 推荐用法
b.Loop()b.Loop()go
func BenchmarkParse(b *testing.B) {
data := loadFixture("large.json") // setup — excluded from timing
for b.Loop() {
Parse(data) // compiler cannot eliminate this call
}
}Existing benchmarks still work but should migrate to — the old pattern requires manual and a package-level sink variable to prevent dead code elimination.
for range b.Nb.Loop()b.ResetTimer()b.Loop()go
func BenchmarkParse(b *testing.B) {
data := loadFixture("large.json") // 初始化代码——不计入计时
for b.Loop() {
Parse(data) // 编译器无法消除此调用
}
}现有的基准测试仍可运行,但应迁移至——旧模式需要手动调用,并使用包级别的 sink 变量来防止死代码被优化消除。
for range b.Nb.Loop()b.ResetTimer()Memory tracking
内存追踪
go
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // or run with -benchmem flag
for b.Loop() {
_ = make([]byte, 1024)
}
}b.ReportMetric()go
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")go
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // 或使用 -benchmem 标志运行
for b.Loop() {
_ = make([]byte, 1024)
}
}b.ReportMetric()go
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")Sub-benchmarks and table-driven
子基准测试与表驱动测试
go
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}go
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}Running Benchmarks
运行基准测试
bash
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt| Flag | Purpose |
|---|---|
| Run all benchmarks (regexp filter) |
| Report allocations (B/op, allocs/op) |
| Run 10 times for statistical significance |
| Minimum time per benchmark (default 1s) |
| Run with different GOMAXPROCS values |
| Write CPU profile |
| Write memory profile |
| Write execution trace |
Output format: — the suffix is GOMAXPROCS, is time per operation, is bytes allocated per op, is heap allocation count per op.
BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op-8ns/opB/opallocs/opbash
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt| 标志位 | 用途 |
|---|---|
| 运行所有基准测试(支持正则过滤) |
| 报告内存分配情况(B/op, allocs/op) |
| 运行10次以保证统计显著性 |
| 单次基准测试的最短运行时间(默认1秒) |
| 使用不同的GOMAXPROCS值运行 |
| 生成CPU性能报告 |
| 生成内存性能报告 |
| 生成执行追踪文件 |
输出格式: —— 后缀代表GOMAXPROCS值,表示每次操作的耗时,表示每次操作分配的字节数,表示每次操作的堆内存分配次数。
BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op-8ns/opB/opallocs/opProfiling from Benchmarks
从基准测试生成性能报告
Generate profiles directly from benchmark runs — no HTTP server needed:
bash
undefined直接从基准测试运行中生成性能报告——无需HTTP服务器:
bash
undefinedCPU profile
CPU性能报告
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)
内存性能报告(alloc_objects显示GC波动,inuse_space显示内存泄漏)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
Execution trace
执行追踪
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see [pprof Reference](./references/pprof.md). For execution trace interpretation, see [Trace Reference](./references/trace.md). For statistical comparison, see [benchstat Reference](./references/benchstat.md).go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
关于完整的pprof CLI参考(所有命令、非交互模式、性能报告解读),请查看[pprof参考文档](./references/pprof.md)。关于执行追踪解读,请查看[追踪参考文档](./references/trace.md)。关于统计对比,请查看[benchstat参考文档](./references/benchstat.md)。Reference Files
参考文档
-
pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs allocobjects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into _where time and memory are being spent in your code.
-
benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.
-
Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where CPU goes) isn't enough — you need to see the timeline of what happened.
-
Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.
-
Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.
-
CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.
-
Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.
-
Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between
prometheus/client_golang(Go internal data) and Prometheus metrics (what you scrape fromruntime/metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts./metrics
-
pprof参考文档 —— 对CPU、内存和goroutine性能报告进行交互式和非交互式分析。包含完整的CLI命令、报告类型(CPU vs alloc_objects vs inuse_space)、Web UI导航和解读模式。用于深入分析代码中时间和内存的消耗位置。
-
benchstat参考文档 —— 使用严谨的置信区间和p值测试对基准测试运行结果进行统计对比。涵盖输出解读、过滤旧基准测试结果、交错结果以提升视觉清晰度,以及回归检测。当你需要证明某项变更确实带来了有意义的性能提升(而非单次幸运运行的结果)时使用。
-
追踪参考文档 —— 执行追踪工具,用于理解代码运行的时间点和原因。可视化goroutine调度、垃圾回收阶段、网络阻塞和自定义跨度注解。当pprof(仅显示CPU消耗位置)不足以满足需求时使用——你需要查看事件的时间线。
-
诊断工具 —— 辅助工具快速参考:fieldalignment(结构体填充浪费检测)、GODEBUG(运行时日志标志位)、fgprof(调用栈图性能报告)、竞态检测器(并发 bug 检测)等。当你有特定症状需要针对性诊断时使用——如果已有更简单的工具能解答问题,就不必使用pprof。
-
编译器分析 —— 底层编译器优化洞察:逃逸分析(值何时转移到堆内存)、内联决策(哪些函数调用会被消除)、SSA转储(中间表示)和汇编输出。当基准测试显示出意外的内存分配,或你想验证编译器是否按预期执行优化时使用。
-
CI回归检测 —— CI流水线中的自动化性能回归拦截。涵盖三款工具(benchdiff用于快速PR对比,cob用于严格的阈值拦截,gobenchdata用于长期趋势仪表盘)、“噪声邻居”缓解策略(为何即使在空闲机器上,云CI基准测试仍会有5-10%的波动),以及自托管运行器调优以确保基准测试可复现。当你希望确保PR不会悄悄拖慢代码库时使用——尽早检测回归可避免引入性能债务。
-
排查会话 —— 结合Prometheus运行时指标(堆内存大小、GC频率、goroutine数量)、PromQL查询以关联指标与代码变更、运行时配置标志位(GODEBUG环境变量以启用GC日志)和成本警告(当你遭遇性能损耗时)的生产环境性能排查流程。当基准测试结果良好,但真实流量下表现异常时使用。
-
Prometheus Go指标参考文档 ——实际暴露为Prometheus指标的Go运行时指标完整列表。涵盖30个默认指标、40+个可选指标(Go 1.17+)、进程指标和常见PromQL查询。区分
prometheus/client_golang(Go内部数据)和Prometheus指标(从runtime/metrics端点抓取的数据)。当你搭建监控仪表盘或编写生产环境告警的PromQL查询时使用。/metrics
Cross-References
交叉引用
- → See skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y")
samber/cc-skills-golang@golang-performance - → See skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodology
samber/cc-skills-golang@golang-troubleshooting - → See skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)
samber/cc-skills-golang@golang-observability - → See skill for general testing practices
samber/cc-skills-golang@golang-testing
- → 若需在度量后应用优化方案(“若存在X瓶颈,则应用Y方案”),请查看技能
samber/cc-skills-golang@golang-performance - → 若需在运行中服务上配置pprof(启用、安全设置、捕获)、使用Delve调试器、GODEBUG标志位和根因分析方法论,请查看技能
samber/cc-skills-golang@golang-troubleshooting - → 若需日常全时段监控、持续性能剖析(Pyroscope)、分布式追踪(OpenTelemetry),请查看技能
samber/cc-skills-golang@golang-observability - → 若需通用测试实践,请查看技能
samber/cc-skills-golang@golang-testing