go-performance-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Go Performance Review

Go 性能审查

Profile first, optimize second. Never optimize without a benchmark proving the problem.
先分析,再优化。绝不要在没有基准测试证明存在性能问题的情况下进行优化。

1. Allocation Reduction

1. 内存分配优化

Prefer
strconv
over
fmt
for primitive conversions:

简单类型转换优先使用
strconv
而非
fmt

go
// ✅ Good — zero allocations for simple conversions
s := strconv.Itoa(42)
s := strconv.FormatFloat(3.14, 'f', 2, 64)

// ❌ Bad — fmt.Sprintf allocates
s := fmt.Sprintf("%d", 42)
go
// ✅ 推荐 — 简单转换零内存分配
s := strconv.Itoa(42)
s := strconv.FormatFloat(3.14, 'f', 2, 64)

// ❌ 不推荐 — fmt.Sprintf会产生内存分配
s := fmt.Sprintf("%d", 42)

Avoid unnecessary string-to-byte conversions:

避免不必要的字符串与字节数组转换:

go
// ✅ Good — use strings.Builder for concatenation
var b strings.Builder
for _, s := range parts {
    b.WriteString(s)
}
result := b.String()

// ❌ Bad — repeated concatenation allocates on every +
result := ""
for _, s := range parts {
    result += s
}
go
// ✅ 推荐 — 使用strings.Builder进行字符串拼接
var b strings.Builder
for _, s := range parts {
    b.WriteString(s)
}
result := b.String()

// ❌ 不推荐 — 循环中重复拼接会每次都产生内存分配
result := ""
for _, s := range parts {
    result += s
}

Preallocate slices and maps when size is known:

已知大小的切片和映射提前预分配:

go
// ✅ Good — single allocation
users := make([]User, 0, len(ids))
for _, id := range ids {
    users = append(users, getUser(id))
}

// ✅ Good — map with capacity hint
lookup := make(map[string]User, len(users))

// ❌ Bad — repeated growing
var users []User // starts at 0, grows via doubling
go
// ✅ 推荐 — 单次内存分配
users := make([]User, 0, len(ids))
for _, id := range ids {
    users = append(users, getUser(id))
}

// ✅ 推荐 — 给映射设置容量提示
lookup := make(map[string]User, len(users))

// ❌ 不推荐 — 重复扩容
var users []User // 初始长度为0,通过翻倍方式扩容

Use
sync.Pool
for frequently allocated, short-lived objects:

频繁分配的短生命周期对象使用
sync.Pool

go
var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func process(data []byte) string {
    buf := bufPool.Get().(*bytes.Buffer)
    defer func() {
        buf.Reset()
        bufPool.Put(buf)
    }()

    buf.Write(data)
    return buf.String()
}
go
var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func process(data []byte) string {
    buf := bufPool.Get().(*bytes.Buffer)
    defer func() {
        buf.Reset()
        bufPool.Put(buf)
    }()

    buf.Write(data)
    return buf.String()
}

2. Hot Path Optimizations

2. 热点路径优化

Avoid interface conversions in tight loops:

密集循环中避免接口转换:

go
// ✅ Good — concrete type in loop
func sum(vals []int64) int64 {
    var total int64
    for _, v := range vals {
        total += v
    }
    return total
}

// ❌ Bad — interface{} causes boxing/unboxing
func sum(vals []interface{}) int64 { ... }
go
// ✅ 推荐 — 循环中使用具体类型
func sum(vals []int64) int64 {
    var total int64
    for _, v := range vals {
        total += v
    }
    return total
}

// ❌ 不推荐 — interface{}会导致装箱/拆箱
func sum(vals []interface{}) int64 { ... }

Avoid
reflect
in performance-critical paths:

性能关键路径中避免使用
reflect

If you need reflection-like behavior at scale, use code generation (
go generate
,
stringer
, protocol buffers).
如果需要大规模实现类似反射的行为,使用代码生成工具(
go generate
stringer
、Protocol Buffers)。

Reduce pointer chasing:

减少指针追踪:

go
// ✅ Good — contiguous memory, cache-friendly
type Points struct {
    X []float64
    Y []float64
}

// ❌ Slower — pointer chasing per element
type Points []*Point
go
// ✅ 推荐 — 连续内存,缓存友好
type Points struct {
    X []float64
    Y []float64
}

// ❌ 较慢 — 每个元素都需要指针追踪
type Points []*Point

3. Map Performance

3. 映射性能优化

go
// ✅ Use capacity hints
m := make(map[string]int, expectedSize)

// ✅ For read-heavy concurrent access, use sync.Map
// But ONLY when keys are stable — sync.Map has higher overhead
// for writes than a mutex-protected map.

// ✅ For fixed key sets, consider using a slice with index mapping
// instead of a map.
go
// ✅ 设置容量提示
m := make(map[string]int, expectedSize)

// ✅ 读多写少的并发场景使用sync.Map
// 但仅在键稳定时使用 — sync.Map的写入开销比互斥锁保护的映射更高

// ✅ 固定键集合的场景,考虑使用切片配合索引映射替代映射

4. Benchmarking

4. 基准测试

ALWAYS write benchmarks before and after optimization:
go
func BenchmarkFoo(b *testing.B) {
    // Setup outside the loop
    input := generateInput()

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        result = Foo(input) // assign to package-level var to prevent elision
    }
}

// Package-level var prevents compiler from eliminating the call
var result string
Run benchmarks with memory profiling:
bash
go test -bench=BenchmarkFoo -benchmem -count=5 ./...
Compare before/after with
benchstat
:
bash
go test -bench=. -count=10 > old.txt
优化前后务必编写基准测试:
go
func BenchmarkFoo(b *testing.B) {
    // 循环外完成初始化
    input := generateInput()

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        result = Foo(input) // 赋值给包级变量防止被编译器优化消除
    }
}

// 包级变量防止编译器消除函数调用
var result string
结合内存分析运行基准测试:
bash
go test -bench=BenchmarkFoo -benchmem -count=5 ./...
使用
benchstat
对比优化前后结果:
bash
go test -bench=. -count=10 > old.txt

make changes

进行代码修改

go test -bench=. -count=10 > new.txt benchstat old.txt new.txt
undefined
go test -bench=. -count=10 > new.txt benchstat old.txt new.txt
undefined

5. Profiling

5. 性能分析

CPU profiling:

CPU分析:

bash
go test -cpuprofile=cpu.prof -bench=BenchmarkFoo .
go tool pprof cpu.prof
bash
go test -cpuprofile=cpu.prof -bench=BenchmarkFoo .
go tool pprof cpu.prof

Memory profiling:

内存分析:

bash
go test -memprofile=mem.prof -bench=BenchmarkFoo .
go tool pprof -alloc_space mem.prof
bash
go test -memprofile=mem.prof -bench=BenchmarkFoo .
go tool pprof -alloc_space mem.prof

HTTP server profiling (import net/http/pprof):

HTTP服务性能分析(导入net/http/pprof):

go
import _ "net/http/pprof"

// Access at http://localhost:6060/debug/pprof/
go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()
go
import _ "net/http/pprof"

// 访问地址:http://localhost:6060/debug/pprof/
go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()

6. Common Anti-Patterns

6. 常见反模式

Anti-PatternFix
fmt.Sprintf
for simple int→string
strconv.Itoa
String concatenation in loop
strings.Builder
Slice without preallocation
make([]T, 0, n)
Map without capacity hint
make(map[K]V, n)
regexp.Compile
inside function
Compile once at package level
json.Marshal
in hot path
Use code-gen (
easyjson
,
sonic
)
Logging in tight loopBatch or sample
defer
in very tight inner loop
Manual cleanup (rare, benchmark first)
反模式修复方案
使用
fmt.Sprintf
实现简单的整数转字符串
使用
strconv.Itoa
循环中拼接字符串使用
strings.Builder
未预分配的切片使用
make([]T, 0, n)
未设置容量提示的映射使用
make(map[K]V, n)
函数内部调用
regexp.Compile
在包级别提前编译一次
热点路径中使用
json.Marshal
使用代码生成工具(
easyjson
sonic
密集循环中打印日志批量处理或采样打印
极密集的内层循环中使用
defer
手动清理(罕见,需先做基准测试)

Important Caveat

重要提示

Most Go code is not performance-critical. Readability and correctness ALWAYS take priority over micro-optimizations. Only apply these patterns when:
  1. A benchmark proves this code path is a bottleneck
  2. The optimization is significant (>10% improvement)
  3. The resulting code remains readable and maintainable
Premature optimization is still the root of all evil, even in Go.
大多数Go代码并非性能关键型。可读性和正确性始终优先于微优化。仅在以下情况应用这些优化模式:
  1. 基准测试证明该代码路径是性能瓶颈
  2. 优化带来显著提升(>10%)
  3. 优化后的代码仍保持可读性和可维护性
过早优化依然是万恶之源,即使在Go语言中也是如此。