rust-build-times
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRust Build Times
Rust 构建时间
Purpose
目的
Guide agents through diagnosing and improving Rust compilation speed: for build profiling, for caching, the Cranelift codegen backend for faster dev builds, workspace crate splitting, LTO configuration trade-offs, and fast linkers (mold/lld).
cargo-timingssccache引导开发者诊断并提升Rust编译速度:使用进行构建性能分析、实现编译缓存、Cranelift代码生成后端加速开发构建、拆分工作区 crate、配置LTO权衡方案,以及使用快速链接器(mold/lld)。
cargo-timingssccacheTriggers
触发场景
- "My Rust project takes too long to compile"
- "How do I profile which crates are slow to build?"
- "How do I set up sccache for Rust?"
- "What is the Cranelift backend and how does it help?"
- "Should I use thin LTO or fat LTO?"
- "How do I use the mold linker with Rust?"
- "我的Rust项目编译耗时太长"
- "如何分析哪些crate构建缓慢?"
- "如何为Rust配置sccache?"
- "什么是Cranelift后端,它有什么帮助?"
- "我应该使用thin LTO还是fat LTO?"
- "如何将mold链接器与Rust配合使用?"
Workflow
操作流程
1. Diagnose with cargo-timings
1. 使用cargo-timings诊断
bash
undefinedbash
undefinedBuild with timing report
生成构建时间报告
cargo build --timings
cargo build --timings
Opens build/cargo-timings/cargo-timing.html
打开 build/cargo-timings/cargo-timing.html
Shows: crate compilation timeline, parallelism, bottlenecks
展示内容:crate编译时间线、并行度、性能瓶颈
For release builds
针对发布版本构建
cargo build --release --timings
cargo build --release --timings
Key things to look for in the timing report:
时间报告中需重点关注的内容:
- Long sequential chains (no parallelism)
- 长串行链(无并行编译)
- Individual crates taking > 10s (candidates for optimization)
- 单个crate耗时超过10秒(优化候选对象)
- Proc-macro crates blocking everything downstream
- 过程宏crate阻塞下游所有编译
```bash
```bashcargo-llvm-lines — count LLVM IR lines per function (monomorphization)
cargo-llvm-lines — 统计每个函数生成的LLVM IR行数(单态化情况)
cargo install cargo-llvm-lines
cargo llvm-lines --release | head -20
cargo install cargo-llvm-lines
cargo llvm-lines --release | head -20
Shows functions generating the most LLVM IR (template explosion)
展示生成最多LLVM IR的函数(模板膨胀问题)
undefinedundefined2. sccache — compilation caching for Rust
2. sccache — Rust编译缓存工具
bash
undefinedbash
undefinedInstall
安装
cargo install sccache
cargo install sccache
or: brew install sccache
或:brew install sccache
Configure for Rust builds
配置用于Rust构建
export RUSTC_WRAPPER=sccache
export RUSTC_WRAPPER=sccache
Add to .cargo/config.toml (project or global)
添加到 .cargo/config.toml(项目级或全局)
~/.cargo/config.toml
~/.cargo/config.toml
[build]
rustc-wrapper = "sccache"
[build]
rustc-wrapper = "sccache"
Check cache stats
查看缓存统计信息
sccache --show-stats
sccache --show-stats
S3 backend for CI teams
针对CI团队的S3后端配置
export SCCACHE_BUCKET=my-rust-cache
export SCCACHE_REGION=us-east-1
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=yyy
sccache --start-server
export SCCACHE_BUCKET=my-rust-cache
export SCCACHE_REGION=us-east-1
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=yyy
sccache --start-server
GitHub Actions with sccache
GitHub Actions中使用sccache
- uses: mozilla-actions/sccache-action@v0.0.4
- uses: mozilla-actions/sccache-action@v0.0.4
undefinedundefined3. Cranelift codegen backend
3. Cranelift代码生成后端
Cranelift is a fast codegen backend (vs LLVM) — produces slower code but compiles much faster. Ideal for development builds:
bash
undefinedCranelift是一款快速代码生成后端(对比LLVM)——生成的代码运行速度较慢,但编译速度大幅提升。非常适合开发构建:
bash
undefinedInstall nightly (Cranelift requires nightly for now)
安装nightly版本(目前Cranelift需要nightly版本)
rustup toolchain install nightly
rustup component add rustc-codegen-cranelift-preview --toolchain nightly
rustup toolchain install nightly
rustup component add rustc-codegen-cranelift-preview --toolchain nightly
Use Cranelift for dev builds only
仅在开发构建中使用Cranelift
.cargo/config.toml
.cargo/config.toml
[unstable]
codegen-backend = true
[profile.dev]
codegen-backend = "cranelift"
```bash[unstable]
codegen-backend = true
[profile.dev]
codegen-backend = "cranelift"
```bashUse per-build
单次构建时使用
CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
RUSTFLAGS="-Zunstable-options"
cargo +nightly build
RUSTFLAGS="-Zunstable-options"
cargo +nightly build
Cranelift vs LLVM trade-off:
- Dev builds: 20–40% faster compilation with Cranelift
- Runtime performance: LLVM-compiled code is faster (Cranelift skips many optimizations)
- Release builds: always use LLVMCARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
RUSTFLAGS="-Zunstable-options"
cargo +nightly build
RUSTFLAGS="-Zunstable-options"
cargo +nightly build
Cranelift与LLVM的权衡:
- 开发构建:使用Cranelift编译速度提升20–40%
- 运行时性能:LLVM编译的代码速度更快(Cranelift跳过了许多优化)
- 发布构建:始终使用LLVM4. Workspace splitting for parallelism
4. 拆分工作区以实现并行编译
A single large crate compiles sequentially. Split into smaller crates to enable Cargo parallelism:
toml
undefined单个大型crate只能串行编译。拆分为更小的crate可启用Cargo并行编译:
toml
undefinedBefore: one giant crate
拆分前:单个巨型crate
[package]
name = "monolith" # everything in one crate = sequential compile
[package]
name = "monolith" # 所有代码在一个crate中 → 串行编译
After: workspace with parallel crates
拆分后:支持并行的工作区
[workspace]
members = [
"core", # compiled in parallel
"networking", # no deps on ui → parallel with ui
"ui", # no deps on networking → parallel
"server", # depends on core + networking
"cli", # depends on core + ui
]
```bash[workspace]
members = [
"core", # 并行编译
"networking", # 不依赖ui → 与ui并行编译
"ui", # 不依赖networking → 并行编译
"server", # 依赖core + networking
"cli", # 依赖core + ui
]
```bashVisualize dependency graph
可视化依赖图
cargo tree | head -30
cargo tree --graph | dot -Tsvg > deps.svg # visual graph
cargo tree | head -30
cargo tree --graph | dot -Tsvg > deps.svg # 可视化图形
Check how many crates compile in parallel
检查并行编译的crate数量
cargo build -j$(nproc) --timings # maximize parallelism
Rules for effective workspace splitting:
- Break circular dependencies first
- Separate proc-macros into their own crate (they block everything)
- Keep frequently-changed code isolated (less invalidation)cargo build -j$(nproc) --timings # 最大化并行度
有效拆分工作区的规则:
- 首先打破循环依赖
- 将过程宏分离到独立crate中(它们会阻塞所有下游编译)
- 隔离频繁修改的代码(减少缓存失效)5. LTO configuration
5. LTO配置
LTO improves runtime performance but increases link time:
toml
undefinedLTO可提升运行时性能,但会增加链接时间:
toml
undefinedCargo.toml profile configuration
Cargo.toml中的配置文件配置
[profile.release]
lto = "thin" # thin LTO: good performance, much faster than "fat"
codegen-units = 1 # needed for best optimization (but disables parallelism)
[profile.release-fast]
inherits = "release"
lto = "fat" # full LTO: maximum performance, very slow link
[profile.dev]
lto = "off" # never use LTO in dev (compilation speed)
codegen-units = 16 # maximize parallel codegen in dev
LTO comparison:
| Setting | Link time | Runtime perf | Use when |
|---------|-----------|-------------|---------|
| `lto = false` | Fast | Baseline | Dev builds |
| `lto = "thin"` | Moderate | +5–15% | Most release builds |
| `lto = "fat"` | Slow | +15–30% | Maximum performance |
| `codegen-units = 1` | Slowest | Best | With LTO for release |[profile.release]
lto = "thin" # thin LTO:性能良好,比"fat"快得多
codegen-units = 1 # 实现最佳优化所需(但会禁用并行编译)
[profile.release-fast]
inherits = "release"
lto = "fat" # 完整LTO:性能最大化,链接速度极慢
[profile.dev]
lto = "off" # 开发构建中绝不要使用LTO(影响编译速度)
codegen-units = 16 # 开发环境中最大化并行代码生成
LTO对比:
| 设置 | 链接时间 | 运行时性能 | 使用场景 |
|---------|-----------|-------------|---------|
| `lto = false` | 快 | 基准水平 | 开发构建 |
| `lto = "thin"` | 中等 | +5–15% | 大多数发布构建 |
| `lto = "fat"` | 慢 | +15–30% | 性能最大化场景 |
| `codegen-units = 1` | 最慢 | 最佳 | 发布构建配合LTO使用 |6. Fast linkers
6. 快速链接器
The linker is often the bottleneck for large Rust projects:
bash
undefined链接器通常是大型Rust项目的性能瓶颈:
bash
undefinedmold — fastest general-purpose linker (Linux)
mold — 最快的通用链接器(Linux)
sudo apt-get install mold
sudo apt-get install mold
.cargo/config.toml
.cargo/config.toml
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
Or use cargo-zigbuild (uses zig cc as linker)
或使用cargo-zigbuild(使用zig cc作为链接器)
cargo install cargo-zigbuild
cargo zigbuild --release
cargo install cargo-zigbuild
cargo zigbuild --release
lld — LLVM's linker (faster than GNU ld, available everywhere)
lld — LLVM的链接器(比GNU ld快,全平台可用)
.cargo/config.toml
.cargo/config.toml
[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "link-arg=-fuse-ld=lld"]
[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "link-arg=-fuse-ld=lld"]
On macOS: zld or the default lld
macOS系统:使用zld或默认lld
[target.x86_64-apple-darwin]
rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]
Linker speed comparison (large project, typical):
- GNU ld: baseline
- lld: ~2× faster
- mold: ~5–10× faster
- gold: ~1.5× faster[target.x86_64-apple-darwin]
rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]
链接器速度对比(大型项目典型情况):
- GNU ld:基准水平
- lld:约快2倍
- mold:约快5–10倍
- gold:约快1.5倍7. Other quick wins
7. 其他快速优化技巧
bash
undefinedbash
undefinedReduce debug info level (faster but less debuggable)
降低调试信息级别(编译更快但可调试性降低)
Cargo.toml
Cargo.toml
[profile.dev]
debug = 1 # 0=off, 1=line tables, 2=full (default)
[profile.dev]
debug = 1 # 0=关闭,1=仅行表,2=完整信息(默认)
debug=1 saves 20-40% on debug build time
debug=1可节省20-40%的调试构建时间
Split debug info (reduces linker input)
拆分调试信息(减少链接器输入)
[profile.dev]
split-debuginfo = "unpacked" # macOS: equivalent of gsplit-dwarf
[profile.dev]
split-debuginfo = "unpacked" # macOS:等同于gsplit-dwarf
Disable incremental compilation (sometimes faster for full rebuilds)
禁用增量编译(有时全量重建速度更快)
CARGO_INCREMENTAL=0 cargo build
CARGO_INCREMENTAL=0 cargo build
Reduce proc-macro compile time (pin heavy proc-macro deps)
减少过程宏编译时间(固定重型过程宏依赖版本)
Heavy proc-macros: serde, tokio, axum — keep versions stable
重型过程宏:serde、tokio、axum — 保持版本稳定
undefinedundefinedRelated skills
相关技能
- Use for Cargo workspace and profile configuration
skills/rust/cargo-workflows - Use for C/C++ equivalent build acceleration
skills/build-systems/build-acceleration - Use for debug info size/split-dwarf tradeoffs
skills/debuggers/dwarf-debug-format - Use for LTO internals
skills/binaries/linkers-lto
- 使用进行Cargo工作区和配置文件配置
skills/rust/cargo-workflows - 使用获取C/C++等效的构建加速方案
skills/build-systems/build-acceleration - 使用了解调试信息大小/拆分调试的权衡
skills/debuggers/dwarf-debug-format - 使用了解LTO内部机制
skills/binaries/linkers-lto