openclaw-test-heap-leaks

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenClaw Test Heap Leaks

OpenClaw测试堆泄漏问题

Use this skill for test-memory investigations. Do not guess from RSS alone when heap snapshots are available. Treat snapshot-name deltas as triage evidence, not proof, until retainers or dominators support the call.
For runtime fixes (e.g., closure leaks in long-running services like the gateway), see Validating runtime fixes below — that uses a dedicated harness, not the test-parallel snapshot machinery.
本技能用于测试内存问题排查。当有堆快照可用时,请勿仅通过RSS进行猜测。在retainers(保留链)或dominators(支配树)验证之前,仅将快照名称差异作为初步排查依据,而非定论。
针对运行时修复(例如网关等长期运行服务中的闭包泄漏),请参阅下方的验证运行时修复部分——该部分使用专用测试工具,而非并行测试快照机制。

Workflow

工作流程

  1. Reproduce the failing shape first.
    • Match the real entrypoint if possible. For Linux CI-style unit failures, start with:
    • pnpm canvas:a2ui:bundle && OPENCLAW_TEST_MEMORY_TRACE=1 OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap OPENCLAW_TEST_WORKERS=2 OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 pnpm test
    • Keep
      OPENCLAW_TEST_MEMORY_TRACE=1
      enabled so the wrapper prints per-file RSS summaries alongside the snapshots.
    • If the report is about a specific shard or worker budget, preserve that shape.
    • Before you analyze snapshots, identify the real lane names from
      [test-parallel] start ...
      lines or
      pnpm test --plan
      . Do not assume a single
      unit-fast
      lane; local plans often split into
      unit-fast-batch-*
      .
  2. Wait for repeated snapshots before concluding anything.
    • Take at least two intervals from the same lane.
    • Compare snapshots from the same PID inside the real lane directory such as
      .tmp/heapsnap/unit-fast-batch-2/
      .
    • Use
      .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs
      to compare either two files directly or the earliest/latest pair per PID in one lane directory.
    • If the helper suggests transformed-module retention, confirm the top entries in DevTools retainers/dominators before calling it solved.
  3. Classify the growth before choosing a fix.
    • If growth is dominated by Vite/Vitest transformed source strings,
      Module
      ,
      system / Context
      , bytecode, descriptor arrays, or property maps, treat it as likely retained module graph growth in long-lived workers.
    • If growth is dominated by app objects, caches, buffers, server handles, timers, mock state, sqlite state, or similar runtime objects, treat it as a likely cleanup or lifecycle leak.
    • If the names are ambiguous, stop short of a confident label and inspect retainers/dominators in DevTools for the top deltas.
  4. Fix the right layer.
    • For likely retained transformed-module growth in shared workers:
    • Prefer timing and hotspot-driven scheduling fixes first. Check whether the file is already represented in
      test/fixtures/test-timings.unit.json
      and whether
      scripts/test-update-memory-hotspots.mjs
      should refresh the measured hotspot manifest before hand-editing behavior overrides.
    • Move hotspot files out of the real shared lane by updating
      test/fixtures/test-parallel.behavior.json
      only when timing-driven peeling is insufficient.
    • Prefer
      singletonIsolated
      for files that are safe alone but inflate shared worker heaps.
    • If the file should already have been peeled out by timings but is absent from
      test/fixtures/test-timings.unit.json
      , call that out explicitly. Missing timings are a scheduling blind spot.
    • For real leaks:
    • Patch the implicated test or runtime cleanup path.
    • Look for missing
      afterEach
      /
      afterAll
      , module-reset gaps, retained global state, unreleased DB handles, or listeners/timers that survive the file.
  5. Verify with the most direct proof.
    • Re-run the targeted lane or file with heap snapshots enabled if the suite still finishes in reasonable time.
    • If snapshot overhead pushes tests over Vitest timeouts, fall back to the same lane without snapshots and confirm the RSS trend or OOM is reduced.
    • For wrapper-only changes, at minimum verify the expected lanes start and the snapshot files are written.
  1. 首先复现故障场景。
    • 尽可能匹配实际入口点。对于Linux CI风格的单元测试失败,请从以下命令开始:
    • pnpm canvas:a2ui:bundle && OPENCLAW_TEST_MEMORY_TRACE=1 OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap OPENCLAW_TEST_WORKERS=2 OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 pnpm test
    • 保持
      OPENCLAW_TEST_MEMORY_TRACE=1
      开启,以便包装器在快照旁打印每个文件的RSS摘要。
    • 如果报告涉及特定分片或工作进程资源限制,请保留该配置。
    • 在分析快照之前,从
      [test-parallel] start ...
      日志行或
      pnpm test --plan
      命令输出中确定真实的lane名称。不要假设只有
      unit-fast
      这一个lane;本地测试计划通常会拆分为
      unit-fast-batch-*
      多个批次。
  2. 在得出结论前,需获取多组重复快照。
    • 从同一个lane中至少获取两个时间间隔的快照。
    • 对比同一PID在真实lane目录(如
      .tmp/heapsnap/unit-fast-batch-2/
      )下的快照。
    • 使用
      .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs
      脚本直接对比两个文件,或对比单个lane目录下每个PID的最早/最新快照对。
    • 如果工具提示存在转换模块保留问题,请先在DevTools中确认顶级保留链/支配树条目,再判定问题已解决。
  3. 在选择修复方案前,先对内存增长类型进行分类。
    • 如果内存增长主要来自Vite/Vitest转换后的源码字符串、
      Module
      system / Context
      、字节码、描述符数组或属性映射,则视为长期运行工作进程中可能存在的模块图保留增长问题。
    • 如果内存增长主要来自应用对象、缓存、缓冲区、服务器句柄、定时器、模拟状态、sqlite状态或类似运行时对象,则视为可能存在清理或生命周期泄漏问题。
    • 如果对象名称不明确,请不要急于下定论,先在DevTools中检查顶级差异项的保留链/支配树。
  4. 针对正确的层级进行修复。
    • 对于共享工作进程中可能存在的转换模块保留增长问题:
      • 优先选择基于时序和热点的调度修复方案。检查文件是否已在
        test/fixtures/test-timings.unit.json
        中记录,以及是否需要先通过
        scripts/test-update-memory-hotspots.mjs
        刷新测量的热点清单,再手动编辑行为覆盖规则。
      • 仅当时序驱动的剥离方案不足以解决问题时,才通过更新
        test/fixtures/test-parallel.behavior.json
        将热点文件移出共享lane。
      • 对于单独运行安全但会膨胀共享工作进程堆内存的文件,优先使用
        singletonIsolated
        模式。
      • 如果根据时序应该已被剥离的文件未出现在
        test/fixtures/test-timings.unit.json
        中,请明确指出该问题。缺失时序记录属于调度盲区。
    • 对于真实泄漏问题:
      • 修复相关测试或运行时清理路径。
      • 检查是否缺失
        afterEach
        /
        afterAll
        钩子、模块重置漏洞、保留的全局状态、未释放的数据库句柄,或在文件执行后仍存在的监听器/定时器。
  5. 用最直接的证据验证修复效果。
    • 如果测试套件仍能在合理时间内完成,请重新运行目标lane或文件并启用堆快照。
    • 如果快照开销导致测试超出Vitest超时时间,则退回到不启用快照的同一lane,确认RSS趋势或内存溢出问题已缓解。
    • 对于仅修改包装器的变更,至少要验证预期lane已启动且快照文件已生成。

Heuristics

启发式规则

  • Do not call everything a leak. In this repo, large
    unit-fast
    or
    unit-fast-batch-*
    growth can be a worker-lifetime problem rather than an application object leak.
  • scripts/test-parallel.mjs
    and
    scripts/test-parallel-memory.mjs
    are the primary control points for wrapper diagnostics.
  • The lane names printed by
    [test-parallel] start ...
    and
    [test-parallel][mem] summary ...
    tell you where to focus.
  • When one or two files account for most of the delta and they are missing from timings, reducing impact by isolating them is usually the first pragmatic fix.
  • When the same retained object families grow across multiple intervals in the same worker PID, trust the snapshots over intuition, then confirm ambiguous calls with retainer evidence.
  • 不要将所有问题都称为泄漏。在本仓库中,
    unit-fast
    unit-fast-batch-*
    的内存大幅增长可能是工作进程生命周期问题,而非应用对象泄漏。
  • scripts/test-parallel.mjs
    scripts/test-parallel-memory.mjs
    是包装器诊断的主要控制点。
  • [test-parallel] start ...
    [test-parallel][mem] summary ...
    日志行中打印的lane名称可指引你关注重点。
  • 如果少数几个文件导致大部分内存差异,且这些文件未被记录在时序中,那么通过隔离它们来降低影响通常是最务实的首要修复方案。
  • 当同一保留对象族在同一工作进程PID的多个时间间隔内持续增长时,优先信任快照而非直觉,然后通过保留链证据验证模糊的判断。

Snapshot Comparison

快照对比

  • Direct comparison:
    • node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs before.heapsnapshot after.heapsnapshot
  • Auto-select earliest/latest snapshots per PID within one lane:
    • node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs --lane-dir .tmp/heapsnap/unit-fast-batch-2
  • Useful flags:
    • --top 40
    • --min-kb 32
    • --pid 16133
Read the top positive deltas first. Large positive growth in module-transform artifacts suggests lane isolation; large positive growth in runtime objects suggests a real leak. If the names alone do not settle it, open the same snapshot pair in DevTools and inspect retainers/dominators for the top rows before declaring root cause.
  • 直接对比:
    • node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs before.heapsnapshot after.heapsnapshot
  • 自动选择单个lane内每个PID的最早/最新快照:
    • node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs --lane-dir .tmp/heapsnap/unit-fast-batch-2
  • 实用参数:
    • --top 40
    • --min-kb 32
    • --pid 16133
优先查看顶级正增长差异。模块转换产物的大幅正增长表明需要lane隔离;运行时对象的大幅正增长表明存在真实泄漏。如果仅通过名称无法确定原因,请在DevTools中打开同一快照对,检查顶级行的保留链/支配树后再确定根本原因。

Validating runtime fixes (not test-memory)

验证运行时修复(非测试内存问题)

The workflow above is for diagnosing Vitest worker memory growth. For validating that a runtime/closure fix actually releases captured state, use the dedicated harness:
  • pnpm leak:embedded-run
    — runs
    scripts/embedded-run-abort-leak.ts
    . Loops N aborted runs in a function-shaped scope mimicking
    runEmbeddedAttempt
    , writes heap snapshots, and reports a PASS/FAIL verdict on retention growth using
    FinalizationRegistry
    for tracked-instance counting plus RSS delta.
Modes:
  • closure-extracted
    (default) — production fix shape (helper at module scope).
  • closure-inline
    — pre-fix shape (closure inside the runner scope). Use as a sensitivity check: if it passes you've broken the harness, not fixed a bug.
  • synthetic-leak
    — deliberately retains via a module-level bucket. Use to confirm the harness can detect leaks before trusting a PASS on a real fix.
Snapshots land in
.tmp/embedded-run-abort-leak/
. Diff with the same script as above:
node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs \
  .tmp/embedded-run-abort-leak/baseline-*.heapsnapshot \
  .tmp/embedded-run-abort-leak/batch-N-*.heapsnapshot --top 30
When fixing a different runtime leak, add a new harness alongside this one rather than retrofitting it. The fixture function should mimic the lexical scope of the function where the leak lives, not be a generic abort-loop.
上述工作流程用于诊断Vitest工作进程内存增长问题。若要验证运行时/闭包修复是否真正释放了捕获的状态,请使用专用测试工具:
  • pnpm leak:embedded-run
    —— 运行
    scripts/embedded-run-abort-leak.ts
    。在模拟
    runEmbeddedAttempt
    的函数作用域内循环执行N次中断运行,生成堆快照,并通过
    FinalizationRegistry
    跟踪实例计数加上RSS差异来报告保留增长的PASS/FAIL结果。
模式说明:
  • closure-extracted
    (默认)—— 生产环境修复形态(辅助函数位于模块作用域)。
  • closure-inline
    —— 修复前形态(闭包位于运行器作用域)。用作敏感度检查:如果该模式通过,则说明测试工具已失效,而非问题已修复。
  • synthetic-leak
    —— 通过模块级存储桶故意保留对象。用于在信任真实修复的PASS结果前,确认测试工具能够检测到泄漏。
快照将保存到
.tmp/embedded-run-abort-leak/
目录下。使用上述相同脚本进行对比:
node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs \
  .tmp/embedded-run-abort-leak/baseline-*.heapsnapshot \
  .tmp/embedded-run-abort-leak/batch-N-*.heapsnapshot --top 30
修复其他类型的运行时泄漏时,请在现有工具旁添加新的测试工具,而非修改现有工具。测试函数应模拟泄漏所在函数的词法作用域,而非使用通用的中断循环。

Output Expectations

输出要求

When using this skill, report:
  • The exact reproduce command.
  • Which lane and PID were compared.
  • The dominant retained object families from the snapshot delta.
  • Whether the issue is a likely real leak or likely shared-worker retained module growth, plus whether retainers/dominators confirmed it.
  • The concrete fix or impact-reduction patch.
  • What you verified, and what snapshot overhead prevented you from verifying.
使用本技能时,请报告以下内容:
  • 精确的复现命令。
  • 对比的lane和PID。
  • 快照差异中占主导的保留对象类别。
  • 问题是真实泄漏还是共享工作进程模块保留增长,以及是否已通过保留链/支配树验证。
  • 具体的修复方案或降低影响的补丁。
  • 已验证的内容,以及因快照开销无法验证的内容。