go-performance-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Go Performance Review

Go 性能审查

Profile first, optimize second. Never optimize without a benchmark proving the problem.

先分析，再优化。绝不要在没有基准测试证明存在性能问题的情况下进行优化。

1. Allocation Reduction

1. 内存分配优化

Prefer

strconv

over

fmt

for primitive conversions:

简单类型转换优先使用

strconv

而非

fmt

：

// ✅ Good — zero allocations for simple conversions
s := strconv.Itoa(42)
s := strconv.FormatFloat(3.14, 'f', 2, 64)

// ❌ Bad — fmt.Sprintf allocates
s := fmt.Sprintf("%d", 42)

// ✅ 推荐 — 简单转换零内存分配
s := strconv.Itoa(42)
s := strconv.FormatFloat(3.14, 'f', 2, 64)

// ❌ 不推荐 — fmt.Sprintf会产生内存分配
s := fmt.Sprintf("%d", 42)

Avoid unnecessary string-to-byte conversions:

避免不必要的字符串与字节数组转换：

// ✅ Good — use strings.Builder for concatenation
var b strings.Builder
for _, s := range parts {
    b.WriteString(s)
}
result := b.String()

// ❌ Bad — repeated concatenation allocates on every +
result := ""
for _, s := range parts {
    result += s
}

// ✅ 推荐 — 使用strings.Builder进行字符串拼接
var b strings.Builder
for _, s := range parts {
    b.WriteString(s)
}
result := b.String()

// ❌ 不推荐 — 循环中重复拼接会每次都产生内存分配
result := ""
for _, s := range parts {
    result += s
}

Preallocate slices and maps when size is known:

已知大小的切片和映射提前预分配：

// ✅ Good — single allocation
users := make([]User, 0, len(ids))
for _, id := range ids {
    users = append(users, getUser(id))
}

// ✅ Good — map with capacity hint
lookup := make(map[string]User, len(users))

// ❌ Bad — repeated growing
var users []User // starts at 0, grows via doubling

// ✅ 推荐 — 单次内存分配
users := make([]User, 0, len(ids))
for _, id := range ids {
    users = append(users, getUser(id))
}

// ✅ 推荐 — 给映射设置容量提示
lookup := make(map[string]User, len(users))

// ❌ 不推荐 — 重复扩容
var users []User // 初始长度为0，通过翻倍方式扩容

Use

sync.Pool

for frequently allocated, short-lived objects:

频繁分配的短生命周期对象使用

sync.Pool

：

var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func process(data []byte) string {
    buf := bufPool.Get().(*bytes.Buffer)
    defer func() {
        buf.Reset()
        bufPool.Put(buf)
    }()

    buf.Write(data)
    return buf.String()
}

var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func process(data []byte) string {
    buf := bufPool.Get().(*bytes.Buffer)
    defer func() {
        buf.Reset()
        bufPool.Put(buf)
    }()

    buf.Write(data)
    return buf.String()
}

2. Hot Path Optimizations

2. 热点路径优化

Avoid interface conversions in tight loops:

密集循环中避免接口转换：

// ✅ Good — concrete type in loop
func sum(vals []int64) int64 {
    var total int64
    for _, v := range vals {
        total += v
    }
    return total
}

// ❌ Bad — interface{} causes boxing/unboxing
func sum(vals []interface{}) int64 { ... }

// ✅ 推荐 — 循环中使用具体类型
func sum(vals []int64) int64 {
    var total int64
    for _, v := range vals {
        total += v
    }
    return total
}

// ❌ 不推荐 — interface{}会导致装箱/拆箱
func sum(vals []interface{}) int64 { ... }

Avoid

reflect

in performance-critical paths:

性能关键路径中避免使用

reflect

：

If you need reflection-like behavior at scale, use code generation (

go generate

stringer

, protocol buffers).

如果需要大规模实现类似反射的行为，使用代码生成工具（

go generate

、

stringer

、Protocol Buffers）。

Reduce pointer chasing:

减少指针追踪：

// ✅ Good — contiguous memory, cache-friendly
type Points struct {
    X []float64
    Y []float64
}

// ❌ Slower — pointer chasing per element
type Points []*Point

// ✅ 推荐 — 连续内存，缓存友好
type Points struct {
    X []float64
    Y []float64
}

// ❌ 较慢 — 每个元素都需要指针追踪
type Points []*Point

3. Map Performance

3. 映射性能优化

// ✅ Use capacity hints
m := make(map[string]int, expectedSize)

// ✅ For read-heavy concurrent access, use sync.Map
// But ONLY when keys are stable — sync.Map has higher overhead
// for writes than a mutex-protected map.

// ✅ For fixed key sets, consider using a slice with index mapping
// instead of a map.

// ✅ 设置容量提示
m := make(map[string]int, expectedSize)

// ✅ 读多写少的并发场景使用sync.Map
// 但仅在键稳定时使用 — sync.Map的写入开销比互斥锁保护的映射更高

// ✅ 固定键集合的场景，考虑使用切片配合索引映射替代映射

4. Benchmarking

4. 基准测试

ALWAYS write benchmarks before and after optimization:

func BenchmarkFoo(b *testing.B) {
    // Setup outside the loop
    input := generateInput()

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        result = Foo(input) // assign to package-level var to prevent elision
    }
}

// Package-level var prevents compiler from eliminating the call
var result string

Run benchmarks with memory profiling:

bash

go test -bench=BenchmarkFoo -benchmem -count=5 ./...

Compare before/after with

benchstat

bash

go test -bench=. -count=10 > old.txt

优化前后务必编写基准测试：

func BenchmarkFoo(b *testing.B) {
    // 循环外完成初始化
    input := generateInput()

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        result = Foo(input) // 赋值给包级变量防止被编译器优化消除
    }
}

// 包级变量防止编译器消除函数调用
var result string

结合内存分析运行基准测试：

bash

go test -bench=BenchmarkFoo -benchmem -count=5 ./...

使用

benchstat

对比优化前后结果：

bash

go test -bench=. -count=10 > old.txt

make changes

进行代码修改

go test -bench=. -count=10 > new.txt benchstat old.txt new.txt

undefined

go test -bench=. -count=10 > new.txt benchstat old.txt new.txt

undefined

5. Profiling

5. 性能分析

CPU profiling:

CPU分析：

bash

go test -cpuprofile=cpu.prof -bench=BenchmarkFoo .
go tool pprof cpu.prof

bash

go test -cpuprofile=cpu.prof -bench=BenchmarkFoo .
go tool pprof cpu.prof

Memory profiling:

内存分析：

bash

go test -memprofile=mem.prof -bench=BenchmarkFoo .
go tool pprof -alloc_space mem.prof

bash

go test -memprofile=mem.prof -bench=BenchmarkFoo .
go tool pprof -alloc_space mem.prof

HTTP server profiling (import net/http/pprof):

HTTP服务性能分析（导入net/http/pprof）：

import _ "net/http/pprof"

// Access at http://localhost:6060/debug/pprof/
go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()

import _ "net/http/pprof"

// 访问地址：http://localhost:6060/debug/pprof/
go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()

6. Common Anti-Patterns

6. 常见反模式

Anti-Pattern	Fix
`fmt.Sprintf` for simple int→string	`strconv.Itoa`
String concatenation in loop	`strings.Builder`
Slice without preallocation	`make([]T, 0, n)`
Map without capacity hint	`make(map[K]V, n)`
`regexp.Compile` inside function	Compile once at package level
`json.Marshal` in hot path	Use code-gen ( `easyjson` , `sonic` )
Logging in tight loop	Batch or sample
`defer` in very tight inner loop	Manual cleanup (rare, benchmark first)

反模式	修复方案
使用 `fmt.Sprintf` 实现简单的整数转字符串	使用 `strconv.Itoa`
循环中拼接字符串	使用 `strings.Builder`
未预分配的切片	使用 `make([]T, 0, n)`
未设置容量提示的映射	使用 `make(map[K]V, n)`
函数内部调用 `regexp.Compile`	在包级别提前编译一次
热点路径中使用 `json.Marshal`	使用代码生成工具（ `easyjson` 、 `sonic` ）
密集循环中打印日志	批量处理或采样打印
极密集的内层循环中使用 `defer`	手动清理（罕见，需先做基准测试）

Important Caveat

重要提示

Most Go code is not performance-critical. Readability and correctness ALWAYS take priority over micro-optimizations. Only apply these patterns when:

A benchmark proves this code path is a bottleneck
The optimization is significant (>10% improvement)
The resulting code remains readable and maintainable

Premature optimization is still the root of all evil, even in Go.

大多数Go代码并非性能关键型。可读性和正确性始终优先于微优化。仅在以下情况应用这些优化模式：

基准测试证明该代码路径是性能瓶颈
优化带来显著提升（>10%）
优化后的代码仍保持可读性和可维护性

过早优化依然是万恶之源，即使在Go语言中也是如此。

go-performance-review

Original

Translation

Go Performance Review

Go 性能审查

1. Allocation Reduction

1. 内存分配优化

Prefer strconv over fmt for primitive conversions:

简单类型转换优先使用strconv而非fmt：

Avoid unnecessary string-to-byte conversions:

避免不必要的字符串与字节数组转换：

Preallocate slices and maps when size is known:

已知大小的切片和映射提前预分配：

Use sync.Pool for frequently allocated, short-lived objects:

频繁分配的短生命周期对象使用sync.Pool：

2. Hot Path Optimizations

2. 热点路径优化

Avoid interface conversions in tight loops:

密集循环中避免接口转换：

Avoid reflect in performance-critical paths:

性能关键路径中避免使用reflect：

Reduce pointer chasing:

减少指针追踪：

3. Map Performance

3. 映射性能优化

4. Benchmarking

4. 基准测试

make changes

进行代码修改

5. Profiling

5. 性能分析

CPU profiling:

CPU分析：

Memory profiling:

内存分析：

HTTP server profiling (import net/http/pprof):

HTTP服务性能分析（导入net/http/pprof）：

6. Common Anti-Patterns

6. 常见反模式

Important Caveat

重要提示

Prefer
`strconv`
over
`fmt`
for primitive conversions:

简单类型转换优先使用
`strconv`
而非
`fmt`
：

Use
`sync.Pool`
for frequently allocated, short-lived objects:

频繁分配的短生命周期对象使用
`sync.Pool`
：

Avoid
`reflect`
in performance-critical paths:

性能关键路径中避免使用
`reflect`
：