threejs-perf
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseThree.js Performance Optimization
Three.js 性能优化
Performance patterns for Three.js games, backed by measured before/after numbers on Three.js r183 (headless Chromium via Playwright, Apple M1 Pro, software WebGL).
适用于Three.js游戏的性能优化模式,基于Three.js r183版本的实测前后数据(通过Playwright使用无头Chromium,Apple M1 Pro,软件WebGL环境)。
Reference Files
参考文件
- — InstancedMesh for large static repeated objects (19,600 → 1 draw call)
instancing-static.md - — Flat state buffer + batched InstancedMesh writes for moving entities (8,000 entities)
instancing-moving.md - — Baseline vs optimized reference implementations for each pattern
templates/
- — 用于大型静态重复对象的InstancedMesh(19600次 → 1次绘制调用)
instancing-static.md - — 扁平状态缓冲区 + 批量InstancedMesh写入,适用于移动实体(8000个实体)
instancing-moving.md - — 每种模式的基准实现与优化实现参考
templates/
When to Use This Skill
何时使用该方案
- Scene has 100+ repeated objects sharing geometry/material
- Draw calls exceed 500 and frame time is unstable
- Thousands of moving entities need per-frame transform updates
- Profile shows scene-graph traversal as a bottleneck
- 场景包含100个以上共享几何体/材质的重复对象
- 绘制调用超过500次且帧时间不稳定
- 数千个移动实体需要逐帧更新变换
- 性能分析显示场景图遍历是瓶颈
When NOT to Use
何时不使用
- Object count is low (<50 unique meshes) — simpler code wins
- Every object needs unique materials/shaders that defeat batching
- Geometry differs enough that instancing provides no batching benefit
- 对象数量较少(<50个唯一网格)—— 更简洁的代码更合适
- 每个对象都需要独特的材质/着色器,导致批处理失效
- 几何体差异过大,实例化无法带来批处理收益
Pattern 1: Instancing Large Static Object Sets
模式1:大型静态对象集实例化
Problem: Forests, debris, decorations as individual Meshes = unnecessary draw calls.
Solution: One per shared geometry+material combo.
InstancedMeshEvidence: ~19,365 → 2 draw calls. Render CPU p95: 28.5ms → 0.5ms (~57× faster). Build: 39.4ms → 3.9ms. See .
instancing-static.mdjs
// Anti-pattern: one Mesh per prop
for (let i = 0; i < 19600; i++) {
const mesh = new THREE.Mesh(geometry, material);
mesh.position.set(x, 0, z);
scene.add(mesh); // 19,600 draw calls
}
// Correct: one InstancedMesh
const im = new THREE.InstancedMesh(geometry, material, 19600);
const mat = new THREE.Matrix4();
for (let i = 0; i < 19600; i++) {
mat.makeTranslation(x, 0, z);
im.setMatrixAt(i, mat);
}
im.instanceMatrix.needsUpdate = true;
scene.add(im); // 1 draw call问题:森林、碎片、装饰等作为单个Mesh存在 → 产生不必要的绘制调用。
解决方案:每个共享几何体+材质组合使用一个。
InstancedMesh实测数据:约19365次 → 2次绘制调用。渲染CPU p95:28.5ms → 0.5ms(约57倍提速)。构建时间:39.4ms → 3.9ms。详见。
instancing-static.mdjs
// 反模式:每个道具对应一个Mesh
for (let i = 0; i < 19600; i++) {
const mesh = new THREE.Mesh(geometry, material);
mesh.position.set(x, 0, z);
scene.add(mesh); // 19600次绘制调用
}
// 正确方式:单个InstancedMesh
const im = new THREE.InstancedMesh(geometry, material, 19600);
const mat = new THREE.Matrix4();
for (let i = 0; i < 19600; i++) {
mat.makeTranslation(x, 0, z);
im.setMatrixAt(i, mat);
}
im.instanceMatrix.needsUpdate = true;
scene.add(im); // 1次绘制调用Pattern 2: Moving Entity Update Loops
模式2:移动实体更新循环
Problem: Thousands of moving actors as individual Meshes = scene-graph churn + transform propagation.
Solution: Flat entity state buffer + batched writes.
InstancedMesh.setMatrixAt()Evidence: 8,000 → 1 draw calls. Render CPU p95: 9.9ms → 0.5ms (~20× faster). Update loop p95: 1.4ms → 0.3ms. See .
instancing-moving.mdjs
// Anti-pattern: per-entity Mesh position writes
meshes.forEach((mesh, i) => {
mesh.position.x = computeX(i, tick);
mesh.position.y = computeY(i, tick);
});
// Correct: batched instance matrix writes
const mat = new THREE.Matrix4();
for (let i = 0; i < count; i++) {
mat.makeTranslation(computeX(i, tick), computeY(i, tick), computeZ(i, tick));
instancedMesh.setMatrixAt(i, mat);
}
instancedMesh.instanceMatrix.needsUpdate = true;问题:数千个移动角色作为单个Mesh存在 → 场景图频繁变动 + 变换传播开销。
解决方案:扁平实体状态缓冲区 + 批量写入。
InstancedMesh.setMatrixAt()实测数据:8000次 → 1次绘制调用。渲染CPU p95:9.9ms → 0.5ms(约20倍提速)。更新循环p95:1.4ms → 0.3ms。详见。
instancing-moving.mdjs
// 反模式:逐实体写入Mesh位置
meshes.forEach((mesh, i) => {
mesh.position.x = computeX(i, tick);
mesh.position.y = computeY(i, tick);
});
// 正确方式:批量实例矩阵写入
const mat = new THREE.Matrix4();
for (let i = 0; i < count; i++) {
mat.makeTranslation(computeX(i, tick), computeY(i, tick), computeZ(i, tick));
instancedMesh.setMatrixAt(i, mat);
}
instancedMesh.instanceMatrix.needsUpdate = true;Decision Tree
决策树
Is the object repeated 50+ times with same geometry+material?
├── YES → Is it static (no per-frame movement)?
│ ├── YES → Pattern 1: Static InstancedMesh (instancing-static.md)
│ └── NO → Pattern 2: Moving InstancedMesh with batched writes (instancing-moving.md)
└── NO → Standard Mesh is fine. Focus on material/geometry reuse.对象是否重复50次以上且使用相同几何体+材质?
├── 是 → 是否为静态(无逐帧移动)?
│ ├── 是 → 模式1:静态InstancedMesh(instancing-static.md)
│ └── 否 → 模式2:带批量写入的移动InstancedMesh(instancing-moving.md)
└── 否 → 使用标准Mesh即可。重点关注材质/几何体复用。Measured Results
实测结果
Headless Chromium 147 via Playwright, Three.js r183, Apple M1 Pro, 30 warmup + 180 sample frames, median of 3 runs.
| Scenario | Metric | Baseline | Optimized | Improvement |
|---|---|---|---|---|
| Static World (19.6k cubes) | Draw calls | ~19,365 | 2 | ~9,682× |
| Static World (19.6k cubes) | Render CPU p95 | 28.5ms | 0.5ms | ~57× |
| Static World (19.6k cubes) | Build | 39.4ms | 3.9ms | ~10× |
| Moving Entities (8k wave-field) | Draw calls | 8,000 | 1 | 8,000× |
| Moving Entities (8k wave-field) | Render CPU p95 | 9.9ms | 0.5ms | ~20× |
| Moving Entities (8k wave-field) | Update loop p95 | 1.4ms | 0.3ms | ~4.7× |
通过Playwright使用无头Chromium 147,Three.js r183,Apple M1 Pro,30帧预热 + 180帧采样,取3次运行的中位数。
| 场景 | 指标 | 基准值 | 优化后 | 提升幅度 |
|---|---|---|---|---|
| 静态场景(19.6k个立方体) | 绘制调用 | ~19365 | 2 | ~9682倍 |
| 静态场景(19.6k个立方体) | 渲染CPU p95 | 28.5ms | 0.5ms | ~57倍 |
| 静态场景(19.6k个立方体) | 构建时间 | 39.4ms | 3.9ms | ~10倍 |
| 移动实体(8k个波场对象) | 绘制调用 | 8000 | 1 | 8000倍 |
| 移动实体(8k个波场对象) | 渲染CPU p95 | 9.9ms | 0.5ms | ~20倍 |
| 移动实体(8k个波场对象) | 更新循环p95 | 1.4ms | 0.3ms | ~4.7倍 |
Methodology notes
方法说明
- CPU-side metrics are the trustworthy signal. Draw calls, render CPU p95, update loop, and build time reliably show the 1–2 order-of-magnitude win.
- FPS and frame-time p95 are unreliable in headless Chromium. Playwright's bundled Chromium uses SwiftShader (software WebGL), which bottlenecks on fragment shading of ~90 MB of visible geometry regardless of draw-call count. On real hardware WebGL, the FPS gap would be substantially larger — baseline would drop to single-digit FPS under real fill, and optimized would hit vsync cleanly.
- A benchmark passes if draw calls decreased and render CPU p95 did not regress.
- CPU端指标是可靠信号。绘制调用、渲染CPU p95、更新循环和构建时间能稳定体现1-2个数量级的性能提升。
- 无头Chromium中的FPS和帧时间p95不可靠。Playwright捆绑的Chromium使用SwiftShader(软件WebGL),无论绘制调用数量多少,都会因约90MB可见几何体的片段着色而出现瓶颈。在真实硬件WebGL环境中,FPS差距会显著扩大——基准版本在真实填充率下会降至个位数FPS,而优化版本能稳定达到垂直同步帧率。
- 基准测试通过标准:绘制调用减少且渲染CPU p95未出现性能倒退。