rust-build-times

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Rust Build Times

Rust 构建时间

Purpose

目的

Guide agents through diagnosing and improving Rust compilation speed:
cargo-timings
for build profiling,
sccache
for caching, the Cranelift codegen backend for faster dev builds, workspace crate splitting, LTO configuration trade-offs, and fast linkers (mold/lld).
引导开发者诊断并提升Rust编译速度:使用
cargo-timings
进行构建性能分析、
sccache
实现编译缓存、Cranelift代码生成后端加速开发构建、拆分工作区 crate、配置LTO权衡方案,以及使用快速链接器(mold/lld)。

Triggers

触发场景

  • "My Rust project takes too long to compile"
  • "How do I profile which crates are slow to build?"
  • "How do I set up sccache for Rust?"
  • "What is the Cranelift backend and how does it help?"
  • "Should I use thin LTO or fat LTO?"
  • "How do I use the mold linker with Rust?"
  • "我的Rust项目编译耗时太长"
  • "如何分析哪些crate构建缓慢?"
  • "如何为Rust配置sccache?"
  • "什么是Cranelift后端,它有什么帮助?"
  • "我应该使用thin LTO还是fat LTO?"
  • "如何将mold链接器与Rust配合使用?"

Workflow

操作流程

1. Diagnose with cargo-timings

1. 使用cargo-timings诊断

bash
undefined
bash
undefined

Build with timing report

生成构建时间报告

cargo build --timings
cargo build --timings

Opens build/cargo-timings/cargo-timing.html

打开 build/cargo-timings/cargo-timing.html

Shows: crate compilation timeline, parallelism, bottlenecks

展示内容:crate编译时间线、并行度、性能瓶颈

For release builds

针对发布版本构建

cargo build --release --timings
cargo build --release --timings

Key things to look for in the timing report:

时间报告中需重点关注的内容:

- Long sequential chains (no parallelism)

- 长串行链(无并行编译)

- Individual crates taking > 10s (candidates for optimization)

- 单个crate耗时超过10秒(优化候选对象)

- Proc-macro crates blocking everything downstream

- 过程宏crate阻塞下游所有编译


```bash

```bash

cargo-llvm-lines — count LLVM IR lines per function (monomorphization)

cargo-llvm-lines — 统计每个函数生成的LLVM IR行数(单态化情况)

cargo install cargo-llvm-lines cargo llvm-lines --release | head -20
cargo install cargo-llvm-lines cargo llvm-lines --release | head -20

Shows functions generating the most LLVM IR (template explosion)

展示生成最多LLVM IR的函数(模板膨胀问题)

undefined
undefined

2. sccache — compilation caching for Rust

2. sccache — Rust编译缓存工具

bash
undefined
bash
undefined

Install

安装

cargo install sccache
cargo install sccache

or: brew install sccache

或:brew install sccache

Configure for Rust builds

配置用于Rust构建

export RUSTC_WRAPPER=sccache
export RUSTC_WRAPPER=sccache

Add to .cargo/config.toml (project or global)

添加到 .cargo/config.toml(项目级或全局)

~/.cargo/config.toml

~/.cargo/config.toml

[build] rustc-wrapper = "sccache"
[build] rustc-wrapper = "sccache"

Check cache stats

查看缓存统计信息

sccache --show-stats
sccache --show-stats

S3 backend for CI teams

针对CI团队的S3后端配置

export SCCACHE_BUCKET=my-rust-cache export SCCACHE_REGION=us-east-1 export AWS_ACCESS_KEY_ID=xxx export AWS_SECRET_ACCESS_KEY=yyy sccache --start-server
export SCCACHE_BUCKET=my-rust-cache export SCCACHE_REGION=us-east-1 export AWS_ACCESS_KEY_ID=xxx export AWS_SECRET_ACCESS_KEY=yyy sccache --start-server

GitHub Actions with sccache

GitHub Actions中使用sccache

- uses: mozilla-actions/sccache-action@v0.0.4

- uses: mozilla-actions/sccache-action@v0.0.4

undefined
undefined

3. Cranelift codegen backend

3. Cranelift代码生成后端

Cranelift is a fast codegen backend (vs LLVM) — produces slower code but compiles much faster. Ideal for development builds:
bash
undefined
Cranelift是一款快速代码生成后端(对比LLVM)——生成的代码运行速度较慢,但编译速度大幅提升。非常适合开发构建:
bash
undefined

Install nightly (Cranelift requires nightly for now)

安装nightly版本(目前Cranelift需要nightly版本)

rustup toolchain install nightly rustup component add rustc-codegen-cranelift-preview --toolchain nightly
rustup toolchain install nightly rustup component add rustc-codegen-cranelift-preview --toolchain nightly

Use Cranelift for dev builds only

仅在开发构建中使用Cranelift

.cargo/config.toml

.cargo/config.toml

[unstable] codegen-backend = true
[profile.dev] codegen-backend = "cranelift"

```bash
[unstable] codegen-backend = true
[profile.dev] codegen-backend = "cranelift"

```bash

Use per-build

单次构建时使用

CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
RUSTFLAGS="-Zunstable-options"
cargo +nightly build

Cranelift vs LLVM trade-off:
- Dev builds: 20–40% faster compilation with Cranelift
- Runtime performance: LLVM-compiled code is faster (Cranelift skips many optimizations)
- Release builds: always use LLVM
CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
RUSTFLAGS="-Zunstable-options"
cargo +nightly build

Cranelift与LLVM的权衡:
- 开发构建:使用Cranelift编译速度提升20–40%
- 运行时性能:LLVM编译的代码速度更快(Cranelift跳过了许多优化)
- 发布构建:始终使用LLVM

4. Workspace splitting for parallelism

4. 拆分工作区以实现并行编译

A single large crate compiles sequentially. Split into smaller crates to enable Cargo parallelism:
toml
undefined
单个大型crate只能串行编译。拆分为更小的crate可启用Cargo并行编译:
toml
undefined

Before: one giant crate

拆分前:单个巨型crate

[package] name = "monolith" # everything in one crate = sequential compile
[package] name = "monolith" # 所有代码在一个crate中 → 串行编译

After: workspace with parallel crates

拆分后:支持并行的工作区

[workspace] members = [ "core", # compiled in parallel "networking", # no deps on ui → parallel with ui "ui", # no deps on networking → parallel "server", # depends on core + networking "cli", # depends on core + ui ]

```bash
[workspace] members = [ "core", # 并行编译 "networking", # 不依赖ui → 与ui并行编译 "ui", # 不依赖networking → 并行编译 "server", # 依赖core + networking "cli", # 依赖core + ui ]

```bash

Visualize dependency graph

可视化依赖图

cargo tree | head -30 cargo tree --graph | dot -Tsvg > deps.svg # visual graph
cargo tree | head -30 cargo tree --graph | dot -Tsvg > deps.svg # 可视化图形

Check how many crates compile in parallel

检查并行编译的crate数量

cargo build -j$(nproc) --timings # maximize parallelism

Rules for effective workspace splitting:
- Break circular dependencies first
- Separate proc-macros into their own crate (they block everything)
- Keep frequently-changed code isolated (less invalidation)
cargo build -j$(nproc) --timings # 最大化并行度

有效拆分工作区的规则:
- 首先打破循环依赖
- 将过程宏分离到独立crate中(它们会阻塞所有下游编译)
- 隔离频繁修改的代码(减少缓存失效)

5. LTO configuration

5. LTO配置

LTO improves runtime performance but increases link time:
toml
undefined
LTO可提升运行时性能,但会增加链接时间:
toml
undefined

Cargo.toml profile configuration

Cargo.toml中的配置文件配置

[profile.release] lto = "thin" # thin LTO: good performance, much faster than "fat" codegen-units = 1 # needed for best optimization (but disables parallelism)
[profile.release-fast] inherits = "release" lto = "fat" # full LTO: maximum performance, very slow link
[profile.dev] lto = "off" # never use LTO in dev (compilation speed) codegen-units = 16 # maximize parallel codegen in dev

LTO comparison:

| Setting | Link time | Runtime perf | Use when |
|---------|-----------|-------------|---------|
| `lto = false` | Fast | Baseline | Dev builds |
| `lto = "thin"` | Moderate | +5–15% | Most release builds |
| `lto = "fat"` | Slow | +15–30% | Maximum performance |
| `codegen-units = 1` | Slowest | Best | With LTO for release |
[profile.release] lto = "thin" # thin LTO:性能良好,比"fat"快得多 codegen-units = 1 # 实现最佳优化所需(但会禁用并行编译)
[profile.release-fast] inherits = "release" lto = "fat" # 完整LTO:性能最大化,链接速度极慢
[profile.dev] lto = "off" # 开发构建中绝不要使用LTO(影响编译速度) codegen-units = 16 # 开发环境中最大化并行代码生成

LTO对比:

| 设置 | 链接时间 | 运行时性能 | 使用场景 |
|---------|-----------|-------------|---------|
| `lto = false` | 快 | 基准水平 | 开发构建 |
| `lto = "thin"` | 中等 | +5–15% | 大多数发布构建 |
| `lto = "fat"` | 慢 | +15–30% | 性能最大化场景 |
| `codegen-units = 1` | 最慢 | 最佳 | 发布构建配合LTO使用 |

6. Fast linkers

6. 快速链接器

The linker is often the bottleneck for large Rust projects:
bash
undefined
链接器通常是大型Rust项目的性能瓶颈:
bash
undefined

mold — fastest general-purpose linker (Linux)

mold — 最快的通用链接器(Linux)

sudo apt-get install mold
sudo apt-get install mold

.cargo/config.toml

.cargo/config.toml

[target.x86_64-unknown-linux-gnu] linker = "clang" rustflags = ["-C", "link-arg=-fuse-ld=mold"]
[target.x86_64-unknown-linux-gnu] linker = "clang" rustflags = ["-C", "link-arg=-fuse-ld=mold"]

Or use cargo-zigbuild (uses zig cc as linker)

或使用cargo-zigbuild(使用zig cc作为链接器)

cargo install cargo-zigbuild cargo zigbuild --release
cargo install cargo-zigbuild cargo zigbuild --release

lld — LLVM's linker (faster than GNU ld, available everywhere)

lld — LLVM的链接器(比GNU ld快,全平台可用)

.cargo/config.toml

.cargo/config.toml

[target.x86_64-unknown-linux-gnu] rustflags = ["-C", "link-arg=-fuse-ld=lld"]
[target.x86_64-unknown-linux-gnu] rustflags = ["-C", "link-arg=-fuse-ld=lld"]

On macOS: zld or the default lld

macOS系统:使用zld或默认lld

[target.x86_64-apple-darwin] rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]

Linker speed comparison (large project, typical):
- GNU ld: baseline
- lld: ~2× faster
- mold: ~5–10× faster
- gold: ~1.5× faster
[target.x86_64-apple-darwin] rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]

链接器速度对比(大型项目典型情况):
- GNU ld:基准水平
- lld:约快2倍
- mold:约快5–10倍
- gold:约快1.5倍

7. Other quick wins

7. 其他快速优化技巧

bash
undefined
bash
undefined

Reduce debug info level (faster but less debuggable)

降低调试信息级别(编译更快但可调试性降低)

Cargo.toml

Cargo.toml

[profile.dev] debug = 1 # 0=off, 1=line tables, 2=full (default)
[profile.dev] debug = 1 # 0=关闭,1=仅行表,2=完整信息(默认)

debug=1 saves 20-40% on debug build time

debug=1可节省20-40%的调试构建时间

Split debug info (reduces linker input)

拆分调试信息(减少链接器输入)

[profile.dev] split-debuginfo = "unpacked" # macOS: equivalent of gsplit-dwarf
[profile.dev] split-debuginfo = "unpacked" # macOS:等同于gsplit-dwarf

Disable incremental compilation (sometimes faster for full rebuilds)

禁用增量编译(有时全量重建速度更快)

CARGO_INCREMENTAL=0 cargo build
CARGO_INCREMENTAL=0 cargo build

Reduce proc-macro compile time (pin heavy proc-macro deps)

减少过程宏编译时间(固定重型过程宏依赖版本)

Heavy proc-macros: serde, tokio, axum — keep versions stable

重型过程宏:serde、tokio、axum — 保持版本稳定

undefined
undefined

Related skills

相关技能

  • Use
    skills/rust/cargo-workflows
    for Cargo workspace and profile configuration
  • Use
    skills/build-systems/build-acceleration
    for C/C++ equivalent build acceleration
  • Use
    skills/debuggers/dwarf-debug-format
    for debug info size/split-dwarf tradeoffs
  • Use
    skills/binaries/linkers-lto
    for LTO internals
  • 使用
    skills/rust/cargo-workflows
    进行Cargo工作区和配置文件配置
  • 使用
    skills/build-systems/build-acceleration
    获取C/C++等效的构建加速方案
  • 使用
    skills/debuggers/dwarf-debug-format
    了解调试信息大小/拆分调试的权衡
  • 使用
    skills/binaries/linkers-lto
    了解LTO内部机制