go-performance-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGo Performance Review
Go 性能审查
Profile first, optimize second. Never optimize without a benchmark proving the problem.
先分析,再优化。绝不要在没有基准测试证明存在性能问题的情况下进行优化。
1. Allocation Reduction
1. 内存分配优化
Prefer strconv
over fmt
for primitive conversions:
strconvfmt简单类型转换优先使用strconv
而非fmt
:
strconvfmtgo
// ✅ Good — zero allocations for simple conversions
s := strconv.Itoa(42)
s := strconv.FormatFloat(3.14, 'f', 2, 64)
// ❌ Bad — fmt.Sprintf allocates
s := fmt.Sprintf("%d", 42)go
// ✅ 推荐 — 简单转换零内存分配
s := strconv.Itoa(42)
s := strconv.FormatFloat(3.14, 'f', 2, 64)
// ❌ 不推荐 — fmt.Sprintf会产生内存分配
s := fmt.Sprintf("%d", 42)Avoid unnecessary string-to-byte conversions:
避免不必要的字符串与字节数组转换:
go
// ✅ Good — use strings.Builder for concatenation
var b strings.Builder
for _, s := range parts {
b.WriteString(s)
}
result := b.String()
// ❌ Bad — repeated concatenation allocates on every +
result := ""
for _, s := range parts {
result += s
}go
// ✅ 推荐 — 使用strings.Builder进行字符串拼接
var b strings.Builder
for _, s := range parts {
b.WriteString(s)
}
result := b.String()
// ❌ 不推荐 — 循环中重复拼接会每次都产生内存分配
result := ""
for _, s := range parts {
result += s
}Preallocate slices and maps when size is known:
已知大小的切片和映射提前预分配:
go
// ✅ Good — single allocation
users := make([]User, 0, len(ids))
for _, id := range ids {
users = append(users, getUser(id))
}
// ✅ Good — map with capacity hint
lookup := make(map[string]User, len(users))
// ❌ Bad — repeated growing
var users []User // starts at 0, grows via doublinggo
// ✅ 推荐 — 单次内存分配
users := make([]User, 0, len(ids))
for _, id := range ids {
users = append(users, getUser(id))
}
// ✅ 推荐 — 给映射设置容量提示
lookup := make(map[string]User, len(users))
// ❌ 不推荐 — 重复扩容
var users []User // 初始长度为0,通过翻倍方式扩容Use sync.Pool
for frequently allocated, short-lived objects:
sync.Pool频繁分配的短生命周期对象使用sync.Pool
:
sync.Poolgo
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func process(data []byte) string {
buf := bufPool.Get().(*bytes.Buffer)
defer func() {
buf.Reset()
bufPool.Put(buf)
}()
buf.Write(data)
return buf.String()
}go
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func process(data []byte) string {
buf := bufPool.Get().(*bytes.Buffer)
defer func() {
buf.Reset()
bufPool.Put(buf)
}()
buf.Write(data)
return buf.String()
}2. Hot Path Optimizations
2. 热点路径优化
Avoid interface conversions in tight loops:
密集循环中避免接口转换:
go
// ✅ Good — concrete type in loop
func sum(vals []int64) int64 {
var total int64
for _, v := range vals {
total += v
}
return total
}
// ❌ Bad — interface{} causes boxing/unboxing
func sum(vals []interface{}) int64 { ... }go
// ✅ 推荐 — 循环中使用具体类型
func sum(vals []int64) int64 {
var total int64
for _, v := range vals {
total += v
}
return total
}
// ❌ 不推荐 — interface{}会导致装箱/拆箱
func sum(vals []interface{}) int64 { ... }Avoid reflect
in performance-critical paths:
reflect性能关键路径中避免使用reflect
:
reflectIf you need reflection-like behavior at scale, use code generation
(, , protocol buffers).
go generatestringer如果需要大规模实现类似反射的行为,使用代码生成工具(、、Protocol Buffers)。
go generatestringerReduce pointer chasing:
减少指针追踪:
go
// ✅ Good — contiguous memory, cache-friendly
type Points struct {
X []float64
Y []float64
}
// ❌ Slower — pointer chasing per element
type Points []*Pointgo
// ✅ 推荐 — 连续内存,缓存友好
type Points struct {
X []float64
Y []float64
}
// ❌ 较慢 — 每个元素都需要指针追踪
type Points []*Point3. Map Performance
3. 映射性能优化
go
// ✅ Use capacity hints
m := make(map[string]int, expectedSize)
// ✅ For read-heavy concurrent access, use sync.Map
// But ONLY when keys are stable — sync.Map has higher overhead
// for writes than a mutex-protected map.
// ✅ For fixed key sets, consider using a slice with index mapping
// instead of a map.go
// ✅ 设置容量提示
m := make(map[string]int, expectedSize)
// ✅ 读多写少的并发场景使用sync.Map
// 但仅在键稳定时使用 — sync.Map的写入开销比互斥锁保护的映射更高
// ✅ 固定键集合的场景,考虑使用切片配合索引映射替代映射4. Benchmarking
4. 基准测试
ALWAYS write benchmarks before and after optimization:
go
func BenchmarkFoo(b *testing.B) {
// Setup outside the loop
input := generateInput()
b.ResetTimer()
for i := 0; i < b.N; i++ {
result = Foo(input) // assign to package-level var to prevent elision
}
}
// Package-level var prevents compiler from eliminating the call
var result stringRun benchmarks with memory profiling:
bash
go test -bench=BenchmarkFoo -benchmem -count=5 ./...Compare before/after with :
benchstatbash
go test -bench=. -count=10 > old.txt优化前后务必编写基准测试:
go
func BenchmarkFoo(b *testing.B) {
// 循环外完成初始化
input := generateInput()
b.ResetTimer()
for i := 0; i < b.N; i++ {
result = Foo(input) // 赋值给包级变量防止被编译器优化消除
}
}
// 包级变量防止编译器消除函数调用
var result string结合内存分析运行基准测试:
bash
go test -bench=BenchmarkFoo -benchmem -count=5 ./...使用对比优化前后结果:
benchstatbash
go test -bench=. -count=10 > old.txtmake changes
进行代码修改
go test -bench=. -count=10 > new.txt
benchstat old.txt new.txt
undefinedgo test -bench=. -count=10 > new.txt
benchstat old.txt new.txt
undefined5. Profiling
5. 性能分析
CPU profiling:
CPU分析:
bash
go test -cpuprofile=cpu.prof -bench=BenchmarkFoo .
go tool pprof cpu.profbash
go test -cpuprofile=cpu.prof -bench=BenchmarkFoo .
go tool pprof cpu.profMemory profiling:
内存分析:
bash
go test -memprofile=mem.prof -bench=BenchmarkFoo .
go tool pprof -alloc_space mem.profbash
go test -memprofile=mem.prof -bench=BenchmarkFoo .
go tool pprof -alloc_space mem.profHTTP server profiling (import net/http/pprof):
HTTP服务性能分析(导入net/http/pprof):
go
import _ "net/http/pprof"
// Access at http://localhost:6060/debug/pprof/
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()go
import _ "net/http/pprof"
// 访问地址:http://localhost:6060/debug/pprof/
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()6. Common Anti-Patterns
6. 常见反模式
| Anti-Pattern | Fix |
|---|---|
| |
| String concatenation in loop | |
| Slice without preallocation | |
| Map without capacity hint | |
| Compile once at package level |
| Use code-gen ( |
| Logging in tight loop | Batch or sample |
| Manual cleanup (rare, benchmark first) |
| 反模式 | 修复方案 |
|---|---|
使用 | 使用 |
| 循环中拼接字符串 | 使用 |
| 未预分配的切片 | 使用 |
| 未设置容量提示的映射 | 使用 |
函数内部调用 | 在包级别提前编译一次 |
热点路径中使用 | 使用代码生成工具( |
| 密集循环中打印日志 | 批量处理或采样打印 |
极密集的内层循环中使用 | 手动清理(罕见,需先做基准测试) |
Important Caveat
重要提示
Most Go code is not performance-critical. Readability and correctness ALWAYS
take priority over micro-optimizations. Only apply these patterns when:
- A benchmark proves this code path is a bottleneck
- The optimization is significant (>10% improvement)
- The resulting code remains readable and maintainable
Premature optimization is still the root of all evil, even in Go.
大多数Go代码并非性能关键型。可读性和正确性始终优先于微优化。仅在以下情况应用这些优化模式:
- 基准测试证明该代码路径是性能瓶颈
- 优化带来显著提升(>10%)
- 优化后的代码仍保持可读性和可维护性
过早优化依然是万恶之源,即使在Go语言中也是如此。