rust-build-times

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Rust Build Times

Rust 构建时间

Purpose

目的

Guide agents through diagnosing and improving Rust compilation speed:

cargo-timings

for build profiling,

sccache

for caching, the Cranelift codegen backend for faster dev builds, workspace crate splitting, LTO configuration trade-offs, and fast linkers (mold/lld).

引导开发者诊断并提升Rust编译速度：使用

cargo-timings

进行构建性能分析、

sccache

实现编译缓存、Cranelift代码生成后端加速开发构建、拆分工作区 crate、配置LTO权衡方案，以及使用快速链接器（mold/lld）。

Triggers

触发场景

"My Rust project takes too long to compile"
"How do I profile which crates are slow to build?"
"How do I set up sccache for Rust?"
"What is the Cranelift backend and how does it help?"
"Should I use thin LTO or fat LTO?"
"How do I use the mold linker with Rust?"

"我的Rust项目编译耗时太长"
"如何分析哪些crate构建缓慢？"
"如何为Rust配置sccache？"
"什么是Cranelift后端，它有什么帮助？"
"我应该使用thin LTO还是fat LTO？"
"如何将mold链接器与Rust配合使用？"

Workflow

操作流程

1. Diagnose with cargo-timings

1. 使用cargo-timings诊断

bash

undefined

bash

undefined

Build with timing report

生成构建时间报告

cargo build --timings

Opens build/cargo-timings/cargo-timing.html

打开 build/cargo-timings/cargo-timing.html

Shows: crate compilation timeline, parallelism, bottlenecks

展示内容：crate编译时间线、并行度、性能瓶颈

For release builds

针对发布版本构建

cargo build --release --timings

Key things to look for in the timing report:

时间报告中需重点关注的内容：

- Long sequential chains (no parallelism)

- 长串行链（无并行编译）

- Individual crates taking > 10s (candidates for optimization)

- 单个crate耗时超过10秒（优化候选对象）

- Proc-macro crates blocking everything downstream

- 过程宏crate阻塞下游所有编译


```bash


```bash

cargo-llvm-lines — count LLVM IR lines per function (monomorphization)

cargo-llvm-lines — 统计每个函数生成的LLVM IR行数（单态化情况）

cargo install cargo-llvm-lines cargo llvm-lines --release | head -20

Shows functions generating the most LLVM IR (template explosion)

展示生成最多LLVM IR的函数（模板膨胀问题）

undefined

undefined

2. sccache — compilation caching for Rust

2. sccache — Rust编译缓存工具

bash

undefined

bash

undefined

Install

安装

cargo install sccache

or: brew install sccache

或：brew install sccache

Configure for Rust builds

配置用于Rust构建

export RUSTC_WRAPPER=sccache

Add to .cargo/config.toml (project or global)

添加到 .cargo/config.toml（项目级或全局）

~/.cargo/config.toml

[build] rustc-wrapper = "sccache"

Check cache stats

查看缓存统计信息

sccache --show-stats

S3 backend for CI teams

针对CI团队的S3后端配置

export SCCACHE_BUCKET=my-rust-cache export SCCACHE_REGION=us-east-1 export AWS_ACCESS_KEY_ID=xxx export AWS_SECRET_ACCESS_KEY=yyy sccache --start-server

GitHub Actions with sccache

GitHub Actions中使用sccache

- uses: mozilla-actions/sccache-action@v0.0.4

undefined

undefined

3. Cranelift codegen backend

3. Cranelift代码生成后端

Cranelift is a fast codegen backend (vs LLVM) — produces slower code but compiles much faster. Ideal for development builds:

bash

undefined

Cranelift是一款快速代码生成后端（对比LLVM）——生成的代码运行速度较慢，但编译速度大幅提升。非常适合开发构建：

bash

undefined

Install nightly (Cranelift requires nightly for now)

安装nightly版本（目前Cranelift需要nightly版本）

rustup toolchain install nightly rustup component add rustc-codegen-cranelift-preview --toolchain nightly

Use Cranelift for dev builds only

仅在开发构建中使用Cranelift

.cargo/config.toml

[unstable] codegen-backend = true

[profile.dev] codegen-backend = "cranelift"


```bash

[unstable] codegen-backend = true

[profile.dev] codegen-backend = "cranelift"


```bash

Use per-build

单次构建时使用

CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
RUSTFLAGS="-Zunstable-options"
cargo +nightly build


Cranelift vs LLVM trade-off:
- Dev builds: 20–40% faster compilation with Cranelift
- Runtime performance: LLVM-compiled code is faster (Cranelift skips many optimizations)
- Release builds: always use LLVM

CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
RUSTFLAGS="-Zunstable-options"
cargo +nightly build


Cranelift与LLVM的权衡：
- 开发构建：使用Cranelift编译速度提升20–40%
- 运行时性能：LLVM编译的代码速度更快（Cranelift跳过了许多优化）
- 发布构建：始终使用LLVM

4. Workspace splitting for parallelism

4. 拆分工作区以实现并行编译

A single large crate compiles sequentially. Split into smaller crates to enable Cargo parallelism:

toml

undefined

单个大型crate只能串行编译。拆分为更小的crate可启用Cargo并行编译：

toml

undefined

Before: one giant crate

拆分前：单个巨型crate

[package] name = "monolith" # everything in one crate = sequential compile

[package] name = "monolith" # 所有代码在一个crate中 → 串行编译

After: workspace with parallel crates

拆分后：支持并行的工作区

[workspace] members = [ "core", # compiled in parallel "networking", # no deps on ui → parallel with ui "ui", # no deps on networking → parallel "server", # depends on core + networking "cli", # depends on core + ui ]


```bash

[workspace] members = [ "core", # 并行编译 "networking", # 不依赖ui → 与ui并行编译 "ui", # 不依赖networking → 并行编译 "server", # 依赖core + networking "cli", # 依赖core + ui ]


```bash

Visualize dependency graph

可视化依赖图

cargo tree | head -30 cargo tree --graph | dot -Tsvg > deps.svg # visual graph

cargo tree | head -30 cargo tree --graph | dot -Tsvg > deps.svg # 可视化图形

Check how many crates compile in parallel

检查并行编译的crate数量

cargo build -j$(nproc) --timings # maximize parallelism


Rules for effective workspace splitting:
- Break circular dependencies first
- Separate proc-macros into their own crate (they block everything)
- Keep frequently-changed code isolated (less invalidation)

cargo build -j$(nproc) --timings # 最大化并行度


有效拆分工作区的规则：
- 首先打破循环依赖
- 将过程宏分离到独立crate中（它们会阻塞所有下游编译）
- 隔离频繁修改的代码（减少缓存失效）

5. LTO configuration

5. LTO配置

LTO improves runtime performance but increases link time:

toml

undefined

LTO可提升运行时性能，但会增加链接时间：

toml

undefined

Cargo.toml profile configuration

Cargo.toml中的配置文件配置

[profile.release] lto = "thin" # thin LTO: good performance, much faster than "fat" codegen-units = 1 # needed for best optimization (but disables parallelism)

[profile.release-fast] inherits = "release" lto = "fat" # full LTO: maximum performance, very slow link

[profile.dev] lto = "off" # never use LTO in dev (compilation speed) codegen-units = 16 # maximize parallel codegen in dev


LTO comparison:

| Setting | Link time | Runtime perf | Use when |
|---------|-----------|-------------|---------|
| `lto = false` | Fast | Baseline | Dev builds |
| `lto = "thin"` | Moderate | +5–15% | Most release builds |
| `lto = "fat"` | Slow | +15–30% | Maximum performance |
| `codegen-units = 1` | Slowest | Best | With LTO for release |

[profile.release] lto = "thin" # thin LTO：性能良好，比"fat"快得多 codegen-units = 1 # 实现最佳优化所需（但会禁用并行编译）

[profile.release-fast] inherits = "release" lto = "fat" # 完整LTO：性能最大化，链接速度极慢

[profile.dev] lto = "off" # 开发构建中绝不要使用LTO（影响编译速度） codegen-units = 16 # 开发环境中最大化并行代码生成


LTO对比：

| 设置 | 链接时间 | 运行时性能 | 使用场景 |
|---------|-----------|-------------|---------|
| `lto = false` | 快 | 基准水平 | 开发构建 |
| `lto = "thin"` | 中等 | +5–15% | 大多数发布构建 |
| `lto = "fat"` | 慢 | +15–30% | 性能最大化场景 |
| `codegen-units = 1` | 最慢 | 最佳 | 发布构建配合LTO使用 |

6. Fast linkers

6. 快速链接器

The linker is often the bottleneck for large Rust projects:

bash

undefined

链接器通常是大型Rust项目的性能瓶颈：

bash

undefined

mold — fastest general-purpose linker (Linux)

mold — 最快的通用链接器（Linux）

sudo apt-get install mold

.cargo/config.toml

[target.x86_64-unknown-linux-gnu] linker = "clang" rustflags = ["-C", "link-arg=-fuse-ld=mold"]

Or use cargo-zigbuild (uses zig cc as linker)

或使用cargo-zigbuild（使用zig cc作为链接器）

cargo install cargo-zigbuild cargo zigbuild --release

lld — LLVM's linker (faster than GNU ld, available everywhere)

lld — LLVM的链接器（比GNU ld快，全平台可用）

.cargo/config.toml

[target.x86_64-unknown-linux-gnu] rustflags = ["-C", "link-arg=-fuse-ld=lld"]

On macOS: zld or the default lld

macOS系统：使用zld或默认lld

[target.x86_64-apple-darwin] rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]


Linker speed comparison (large project, typical):
- GNU ld: baseline
- lld: ~2× faster
- mold: ~5–10× faster
- gold: ~1.5× faster

[target.x86_64-apple-darwin] rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]


链接器速度对比（大型项目典型情况）：
- GNU ld：基准水平
- lld：约快2倍
- mold：约快5–10倍
- gold：约快1.5倍

7. Other quick wins

7. 其他快速优化技巧

bash

undefined

bash

undefined

Reduce debug info level (faster but less debuggable)

降低调试信息级别（编译更快但可调试性降低）

Cargo.toml

[profile.dev] debug = 1 # 0=off, 1=line tables, 2=full (default)

[profile.dev] debug = 1 # 0=关闭，1=仅行表，2=完整信息（默认）

debug=1 saves 20-40% on debug build time

debug=1可节省20-40%的调试构建时间

Split debug info (reduces linker input)

拆分调试信息（减少链接器输入）

[profile.dev] split-debuginfo = "unpacked" # macOS: equivalent of gsplit-dwarf

[profile.dev] split-debuginfo = "unpacked" # macOS：等同于gsplit-dwarf

Disable incremental compilation (sometimes faster for full rebuilds)

禁用增量编译（有时全量重建速度更快）

CARGO_INCREMENTAL=0 cargo build

Reduce proc-macro compile time (pin heavy proc-macro deps)

减少过程宏编译时间（固定重型过程宏依赖版本）

Heavy proc-macros: serde, tokio, axum — keep versions stable

重型过程宏：serde、tokio、axum — 保持版本稳定

undefined

undefined