Soak Test

A soak test (also called an endurance test) is an extended play session run with specific observation goals. Unlike a smoke check (broad critical path, ~10 min) or a single-feature playtest (~30 min), a soak test runs for 30 minutes to several hours to surface:

Memory leaks — gradual heap growth that only appears after scene transitions
Performance drift — frame time degradation that worsens over time
State accumulation bugs — issues that only appear after N repetitions of a mechanic (inventory full, score overflow, AI state corruption)
Fun fatigue — mechanics that feel good in a first session but grow repetitive over extended play
Content exhaustion — the point where players run out of novel content

This skill generates the observation protocol and analysis harness — the human does the actual playing.

Output:

production/qa/soak-test-[date]-[duration].md

When to run:

Polish phase — before
```
/gate-check release
```
After fixing a memory or stability issue (regression soak)
When extended play has not been formally tracked

Soak Test（也称为耐力测试）是一项带有特定观察目标的长时间游戏测试会话。与冒烟测试（覆盖核心路径，约10分钟）或单一功能测试（约30分钟）不同，Soak Test 的运行时长为30分钟至数小时，旨在发现：

内存泄漏 — 仅在场景切换后出现的堆内存缓慢增长
性能漂移 — 随时间推移逐渐恶化的帧时长问题
状态累积漏洞 — 重复执行某一机制N次后才会出现的问题（如背包已满、分数溢出、AI状态损坏）
趣味性疲劳 — 首次体验时良好，但长时间游戏后变得重复乏味的机制
内容耗尽 — 玩家无新内容可体验的节点

本技能负责生成观察方案和分析框架 — 实际测试由人工完成。

输出文件：

production/qa/soak-test-[date]-[duration].md

运行时机：

Polish阶段 — 在执行
```
/gate-check release
```
之前
修复内存或稳定性问题后（回归测试Soak）
未对长时间游戏进行正式跟踪时

1. Parse Arguments

1. 解析参数

Duration (default:

1h

):

```
30m
```
— short soak; suitable for testing a single mechanic or scene
```
1h
```
— standard soak; covers most common leak categories
```
2h
```
— extended soak; recommended for first full Polish soak
```
4h
```
— deep soak; required for games with long session design (RPGs, sims)

Focus (default:

all

):

```
memory
```
— focus on heap size, object count, leak patterns
```
stability
```
— focus on crash/freeze/hang detection
```
balance
```
— focus on fun fatigue, content exhaustion, difficulty perception
```
all
```
— all of the above

时长（默认值：

1h

）：

```
30m
```
— 短时长Soak；适用于测试单一机制或场景
```
1h
```
— 标准时长Soak；覆盖大多数常见泄漏类型
```
2h
```
— 延时时长Soak；首次完整Polish阶段测试推荐使用
```
4h
```
— 深度Soak；针对长会话设计的游戏（如RPG、模拟类游戏）必需

测试重点（默认值：

all

）：

```
memory
```
— 重点关注堆内存大小、对象数量、泄漏模式
```
stability
```
— 重点关注崩溃/冻结/卡顿检测
```
balance
```
— 重点关注趣味性疲劳、内容耗尽、难度感知
```
all
```
— 涵盖以上所有内容

2. Load Context

2. 加载上下文

Read:

```
.claude/docs/technical-preferences.md
```
— engine (for engine-specific memory monitoring guidance), performance budgets (memory ceiling, target FPS)
```
design/gdd/game-concept.md
```
— intended session length (for comparison against soak duration), core loop description
Most recent file in
```
production/playtests/
```
— prior playtest findings (to avoid re-documenting known issues)
Most recent file in
```
production/qa/qa-plan-*.md
```
— current sprint test coverage (to understand what has been formally tested vs. what the soak covers)

Note any performance budget targets from technical-preferences.md:

Memory ceiling: [N MB, or "not set"]
Target FPS: [N, or "not set"]
Frame budget: [N ms, or "not set"]

读取以下文件：

```
.claude/docs/technical-preferences.md
```
— 引擎信息（用于特定引擎的内存监控指导）、性能预算（内存上限、目标FPS）
```
design/gdd/game-concept.md
```
— 预期会话时长（与Soak时长对比）、核心循环描述
```
production/playtests/
```
目录下的最新文件 — 之前的测试发现（避免重复记录已知问题）
```
production/qa/qa-plan-*.md
```
目录下的最新文件 — 当前迭代的测试覆盖范围（了解已正式测试内容与Soak测试覆盖内容的差异）

记录

technical-preferences.md

中的性能预算目标：

内存上限：[N MB，或“未设置”]
目标FPS：[N，或“未设置”]
帧时长预算：[N ms，或“未设置”]

3. Define Observation Checkpoints

3. 定义观察检查点

Based on duration, generate timed checkpoints:

30m soak: T+0, T+10, T+20, T+30 1h soak: T+0, T+15, T+30, T+45, T+60 2h soak: T+0, T+20, T+40, T+60, T+80, T+100, T+120 4h soak: T+0, T+30, T+60, T+90, T+120, T+180, T+240

At each checkpoint, the observer records the observation items defined in Phase 4.

根据时长生成定时检查点：

30分钟Soak：T+0、T+10、T+20、T+30 1小时Soak：T+0、T+15、T+30、T+45、T+60 2小时Soak：T+0、T+20、T+40、T+60、T+80、T+100、T+120 4小时Soak：T+0、T+30、T+60、T+90、T+120、T+180、T+240

在每个检查点，测试人员需记录第4阶段定义的观察项。

4. Generate the Soak Test Protocol

4. 生成Soak Test测试方案

Memory / Stability observation items (if focus = memory or all)

内存/稳定性观察项（当测试重点为memory或all时）

Engine-specific monitoring guidance:

Godot 4:

Open Debugger → Monitors tab; track
```
Memory → Static Memory
```
and
```
Object Count → Objects
```
across checkpoints
Record: Static Memory (KB), Object Count, Orphan Nodes count
Alert threshold: Memory growth > 20% from T+0 after the first 15 minutes (some growth on load is expected; sustained growth indicates a leak)

Note:

Performance.get_monitor(Performance.MEMORY_STATIC)

returns bytes in Godot 4.6

Unity:

Open Memory Profiler (Window → Analysis → Memory Profiler)
Record: Total Reserved Memory (MB), GC Allocated (MB), Object Count at each checkpoint
Alert threshold: GC Allocated growing monotonically across 3+ checkpoints

Unreal Engine:

Use
```
stat memory
```
console command at each checkpoint
Record: Physical Memory Used (MB), Physical Memory Available
Alert threshold: Physical Memory Used growth > 50MB over the full soak

引擎特定监控指导：

Godot 4：

打开Debugger → Monitors标签页；跟踪各检查点的
```
Memory → Static Memory
```
和
```
Object Count → Objects
```
记录：静态内存（KB）、对象数量、孤立节点数量
告警阈值：前15分钟后，内存增长超过T+0时的20%（加载时的部分增长属于正常情况；持续增长表明存在泄漏）

注意：在Godot 4.6中，

Performance.get_monitor(Performance.MEMORY_STATIC)

返回字节数

Unity：

打开Memory Profiler（Window → Analysis → Memory Profiler）
记录：每个检查点的总预留内存（MB）、GC分配内存（MB）、对象数量
告警阈值：GC分配内存连续3个及以上检查点单调增长

Unreal Engine：

在每个检查点使用
```
stat memory
```
控制台命令
记录：已使用物理内存（MB）、可用物理内存
告警阈值：整个Soak测试过程中，已使用物理内存增长超过50MB

Stability observation items (if focus = stability or all)

稳定性观察项（当测试重点为stability或all时）

At each checkpoint, note:

No crash, hang, or freeze occurred since last checkpoint
Frame rate still within target budget ([target FPS] fps)
Audio still playing correctly (no desync or silence)
All HUD elements still rendering correctly
Input responding as expected (no input loss or lag spike)

在每个检查点，记录：

自上一个检查点以来未发生崩溃、冻结或卡顿
帧率仍在目标预算范围内（[目标FPS] fps）
音频播放正常（无不同步或静音）
所有HUD元素渲染正常
输入响应符合预期（无输入丢失或延迟峰值）

Balance / fatigue observation items (if focus = balance or all)

平衡性/疲劳度观察项（当测试重点为balance或all时）

Collect subjective observations at each checkpoint:

Core mechanic still feels rewarding (Y/N)
Perceived difficulty level: [too easy / appropriate / too hard]
Any "I've seen this before" moments since last checkpoint? (novel content exhaustion)
Any moment of frustration since last checkpoint? Note cause.
Any moment of peak engagement since last checkpoint? Note cause.

在每个检查点收集主观观察结果：

核心机制仍具有吸引力（是/否）
感知难度：[太简单 / 合适 / 太难]
自上一个检查点以来是否出现“似曾相识”的时刻？（新内容耗尽）
自上一个检查点以来是否出现沮丧时刻？记录原因。
自上一个检查点以来是否出现峰值参与时刻？记录原因。

5. Generate the Protocol Document

5. 生成测试方案文档

markdown

undefined

markdown

undefined

Soak Test Protocol

Date: [date] Duration: [duration] Focus: [memory | stability | balance | all] Engine: [engine] Generated by: /soak-test

Pre-Session Setup

Before starting the soak:

Game is running from a fresh launch (not resumed from a prior session)
All background applications closed (minimise OS memory interference)
Performance monitoring tool open and recording:
- Godot: Debugger → Monitors tab → Memory section visible
- Unity: Memory Profiler window open
- Unreal:
```
stat memory
```
  ready in console
Soak target confirmed: [session design intent from game concept]
Prior known issues to watch for: [from most recent playtest / qa-plan]

Before starting the soak:

Game is running from a fresh launch (not resumed from a prior session)
All background applications closed (minimise OS memory interference)
Performance monitoring tool open and recording:
- Godot: Debugger → Monitors tab → Memory section visible
- Unity: Memory Profiler window open
- Unreal:
```
stat memory
```
  ready in console
Soak target confirmed: [session design intent from game concept]
Prior known issues to watch for: [from most recent playtest / qa-plan]

Baseline (T+0) — Record Before Playing

Metric	Baseline Value
Memory / Heap	[record before first frame of gameplay]
Object Count	[record]
FPS (first 30 seconds)	[record]
[Engine-specific metric]	[record]

Metric	Baseline Value
Memory / Heap	[record before first frame of gameplay]
Object Count	[record]
FPS (first 30 seconds)	[record]
[Engine-specific metric]	[record]

Checkpoint Log

T+[N] minutes

Memory / Stability (if applicable):

Metric	Value	Δ from Baseline	Alert?
Memory / Heap
Object Count
FPS
Crashes / Hangs

Stability checks:

No crash or hang since last checkpoint
Frame rate within budget ([N] fps target)
Audio correct
HUD rendering correctly
Input responding correctly

Balance / Fatigue (if applicable):

Core mechanic still rewarding: Y / N
Difficulty perception: too easy / appropriate / too hard
Notable moments: [note any peak engagement or frustration]
Content exhaustion signs: Y / N — [describe]

Free observations: (Note anything unexpected observed since the last checkpoint)

[Repeat Checkpoint Log section for each timed checkpoint]

Memory / Stability (if applicable):

Metric	Value	Δ from Baseline	Alert?
Memory / Heap
Object Count
FPS
Crashes / Hangs

Stability checks:

No crash or hang since last checkpoint
Frame rate within budget ([N] fps target)
Audio correct
HUD rendering correctly
Input responding correctly

Balance / Fatigue (if applicable):

Core mechanic still rewarding: Y / N
Difficulty perception: too easy / appropriate / too hard
Notable moments: [note any peak engagement or frustration]
Content exhaustion signs: Y / N — [describe]

Free observations: (Note anything unexpected observed since the last checkpoint)

[Repeat Checkpoint Log section for each timed checkpoint]

Post-Session Analysis

Memory Trend

Checkpoint	Memory	Δ/hr extrapolated
T+0
[T+N]

Leak detected? Y / N Estimated time to OOM at current rate: [N hours / not applicable]

Checkpoint	Memory	Δ/hr extrapolated
T+0
[T+N]

Leak detected? Y / N Estimated time to OOM at current rate: [N hours / not applicable]

Stability Summary

Total crashes: [N] Total hangs: [N] Worst FPS observed: [N] fps at [checkpoint] Performance degradation: stable / mild / severe

Balance / Fatigue Summary

Fun curve: [engaged throughout / fatigue onset at T+N / repetitive from start] Content exhaustion point: [never / at T+N / early] Difficulty arc: [appropriate / too easy throughout / difficulty spike at T+N]

Issues Found

ID	Severity	Checkpoint	Description
SOAK-001	S[1-4]	T+[N]	[description]

ID	Severity	Checkpoint	Description
SOAK-001	S[1-4]	T+[N]	[description]

Verdict: PASS / PASS WITH CONCERNS / FAIL

PASS: No leaks detected, stability maintained, fun factor consistent PASS WITH CONCERNS: Minor drift or fatigue noted; addressable in Polish FAIL: Memory leak confirmed, stability breach, or severe fun fatigue

Sign-Off

Tester: [name] — [date]
QA Lead review: [name] — [date]

---

Tester: [name] — [date]
QA Lead review: [name] — [date]

---

6. Write Output

6. 输出内容

Present the protocol summary in conversation, then ask:

"May I write this soak test protocol to

production/qa/soak-test-[date]-[duration].md

?"

Write only after approval.

After writing:

"Protocol written. To run the soak:

Open the file and follow the Pre-Session Setup checklist
Record each checkpoint as you play
Complete the Post-Session Analysis section when done
File bugs from 'Issues Found' to
```
production/qa/bugs/
```
Run
```
/bug-triage sprint
```
after the session to integrate any S1/S2 issues

If the verdict is FAIL, run

/smoke-check

again after fixing the issues."

在对话中展示方案摘要，然后询问：

"是否可以将此Soak Test测试方案写入

production/qa/soak-test-[date]-[duration].md

？"

仅在获得批准后写入文件。

写入完成后：

"测试方案已写入。执行Soak测试的步骤：

打开文件并遵循会话前设置清单
游戏过程中记录每个检查点的内容
测试结束后完成会话后分析部分
将“发现的问题”中的缺陷提交至
```
production/qa/bugs/
```
测试会话结束后执行
```
/bug-triage sprint
```
，整合所有S1/S2级别的问题

如果测试结论为FAIL，修复问题后需再次执行

/smoke-check

。"

Collaborative Protocol

协作协议

This skill generates a protocol — humans run it — never attempt to run a soak test automatically. The observations require a human observer.
Duration should match the game's session design — a 5-minute game doesn't need a 4h soak; a city-builder might. Use judgment and ask if unclear.
First soak should be
all
focus — narrow focus (memory-only) is for regression soaks after a specific fix, not the first pass
Ask before writing — always confirm before creating the protocol file

本技能仅生成测试方案 — 由人工执行测试 — 切勿尝试自动运行Soak Test。观察结果需要人工测试人员完成。
时长应与游戏的会话设计匹配 — 一款5分钟的游戏无需4小时的Soak测试；而城市建造类游戏可能需要。如有疑问，请使用判断或咨询相关人员。
首次Soak测试应选择
all
重点 — 窄范围重点（仅内存）适用于特定修复后的回归测试，而非首次测试
写入前需确认 — 创建测试方案文件前务必获得批准

soak-test

Original

Translation

Soak Test

Soak Test

1. Parse Arguments

1. 解析参数

2. Load Context

2. 加载上下文

3. Define Observation Checkpoints

3. 定义观察检查点

4. Generate the Soak Test Protocol

4. 生成Soak Test测试方案

Memory / Stability observation items (if focus = memory or all)

内存/稳定性观察项（当测试重点为memory或all时）

Stability observation items (if focus = stability or all)

稳定性观察项（当测试重点为stability或all时）

Balance / fatigue observation items (if focus = balance or all)

平衡性/疲劳度观察项（当测试重点为balance或all时）

5. Generate the Protocol Document

5. 生成测试方案文档

Soak Test Protocol

Soak Test Protocol

Pre-Session Setup

Pre-Session Setup

Baseline (T+0) — Record Before Playing

Baseline (T+0) — Record Before Playing

Checkpoint Log

Checkpoint Log

T+[N] minutes

T+[N] minutes

Post-Session Analysis

Post-Session Analysis

Memory Trend

Memory Trend

Stability Summary

Stability Summary

Balance / Fatigue Summary

Balance / Fatigue Summary

Issues Found

Issues Found

Verdict: PASS / PASS WITH CONCERNS / FAIL

Verdict: PASS / PASS WITH CONCERNS / FAIL

Sign-Off

Sign-Off

6. Write Output

6. 输出内容

Collaborative Protocol

协作协议