meticulous-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTo review a Meticulous test run, follow the workflow below step by step, using the CLI commands as described.
Before starting, run theskill to ensure the Meticulous CLI is up to date — unless it has already run earlier in this conversation, in which case skip it.meticulous-cli-update
要查看Meticulous测试运行结果,请按照以下分步工作流程操作,使用文中描述的CLI命令。
开始前,请运行技能确保Meticulous CLI为最新版本——如果本次对话中之前已经运行过,则可跳过此步骤。meticulous-cli-update
Prerequisites
前提条件
You need a for a completed Meticulous test run. Resolve it from whatever you were given:
testRunId- A test-run ID — a 20+ character alphanumeric string (e.g. ), as opposed to a short PR number. Use it directly and skip the rest of this section.
aB3xK9LmN7QrStUvWxYz12 - A PR number — a short integer. Resolve it against the local repo by passing it to below.
gh pr checks <pr-number> - Nothing — omit the argument; resolves the PR associated with the current branch in the local repo.
gh pr checks
To infer the from a PR's GitHub checks (pass the PR number when you have one, omit it to use the current branch's PR):
testRunIdgh pr checks <pr-number> --json name,link,state,bucket \
--jq '.[] | select(.name | startswith("Meticulous Tests")) | select(.bucket != "pending") | select((.link // "") | contains("/test-runs/")) | {name, state, testRunId: (.link | capture("/test-runs/(?<id>[^/?#]+)").id)}'The is the final path segment of the check's details URL ().
testRunIdapp.meticulous.ai/.../test-runs/<id>- Only completed runs: drops checks that are still queued or in progress, so a half-finished check can't hand you a
select(.bucket != "pending")for an incomplete run. This keeps both passing and failing runs — atestRunIdbucket is a completed run with visual diffs, which is exactly what you're usually here to review. Don't narrow this to passing checks only.fail - Check not finished yet: the filter further excludes checks whose
/test-runs/is stilllinkor empty (thenullguard keeps jq from erroring on those rows). If the command returns nothing, wait and re-run, or ask the user.// "" - New commit pushed: no extra steps — always reflects the PR's current head, so the same command returns the latest run's
gh pr checks.testRunId
你需要已完成的Meticulous测试运行对应的。可根据给定信息获取:
testRunId- 测试运行ID:一个20位以上的字母数字字符串(例如),而非短PR编号。直接使用该ID并跳过本节剩余内容。
aB3xK9LmN7QrStUvWxYz12 - PR编号:一个短整数。通过在本地仓库中执行来解析对应的
gh pr checks <pr-number>。testRunId - 未提供任何信息:省略参数;会自动解析本地仓库当前分支关联的PR对应的
gh pr checks。testRunId
从PR的GitHub检查中推断(有PR编号则传入,无则使用当前分支的PR):
testRunIdgh pr checks <pr-number> --json name,link,state,bucket \
--jq '.[] | select(.name | startswith("Meticulous Tests")) | select(.bucket != "pending") | select((.link // "") | contains("/test-runs/")) | {name, state, testRunId: (.link | capture("/test-runs/(?<id>[^/?#]+)").id)}'testRunIdapp.meticulous.ai/.../test-runs/<id>- 仅处理已完成的运行:会过滤掉仍在排队或进行中的检查,避免获取未完成运行的
select(.bucket != "pending")。该条件会保留通过和失败的运行——testRunId类别的运行是带有视觉差异的已完成运行,正是通常需要审查的对象。不要仅筛选通过的检查。fail - 检查尚未完成:过滤器会进一步排除
/test-runs/仍为link或空的检查(null用于防止jq处理这些行时出错)。如果命令无返回结果,请等待后重新运行,或询问用户。// "" - 推送新提交:无需额外步骤——始终会反映PR的当前最新状态,因此同一命令会返回最新运行的
gh pr checks。testRunId
Assess visual frontend changes
评估前端视觉变更
Get an overview of all diffs, then visually inspect representative screenshots to cover all changes. The from Step 1 tell you which screenshots share the same structural DOM change — pick one representative per unique diff ID to efficiently cover all changes. For each representative, always look at the screenshot images first (Step 2) — the diff image is the most informative way to understand what actually changed. Use the DOM diff (Step 3) for additional structural detail, and the timeline (Step 4) only when a diff is unexpected and not explained by the DOM or images. The final report should cover all significant visual changes: each visual change deserves its own explanation.
domDiffIds先获取所有差异的概览,再通过检查代表性截图覆盖所有变更。步骤1中的会告诉你哪些截图存在相同的DOM结构变更——为高效覆盖所有变更,每个唯一差异ID选择一个代表性截图即可。对于每个代表性截图,始终先查看截图图像(步骤2)——差异图像是理解实际变更内容最直观的方式。DOM差异(步骤3)用于补充结构细节,而时间线(步骤4)仅在差异超出预期且无法通过DOM或图像解释时使用。最终报告需涵盖所有重要视觉变更:每个视觉变更都应有单独的说明。
domDiffIdsStep 1 -- Get the replay diff summary
步骤1——获取重放差异摘要
meticulous agent test-run-diffs --testRunId <testRunId>Output format: TSV on stdout, metadata on stderr.
stdout columns:
replayDiffId screenshotName index total outcome mismatch domDiffIdsExample output:
CqctwLpPC7 after-event-0 1 5 diff 0.00234 1;3
RRMGQft7PD after-event-174 3 8 diff 0.01050 1;2
CLkCJ8WLrJ after-event-8 2 4 diff 0.00100 none
Ct8HwmJNzM end-state 5 5 flake 0.00010 error
Ab3xKLmN9Q after-event-12 3 6 missing-base 0.00000 n/aEach row represents a screenshot where a visual pixel difference was detected between the base (before) and head (after) replay. Rows with are confirmed visual differences; other outcomes (, , , , ) are informational.
outcome=diffflakeerrorwarningmissing-basemissing-head- :
outcome(visual pixel difference),diff,flake,error,warning,missing-basemissing-head - (0-1, 5 decimal places) is the pixel mismatch fraction
mismatch - is a semicolon-separated ordered list of diff IDs, one per independent DOM change in the screenshot. Each ID groups structurally identical DOM changes across screenshots (same ID = same structural change). Example:
domDiffIdsmeans two independent DOM changes with IDs 1 and 3. Special values:1;3means no DOM changes were found (either computed successfully with no differences, or a matching screenshot where no diff is expected) -- the visual difference is purely pixel-level (e.g. anti-aliasing, rendering differences), so you must inspect the screenshot images to understand the change.nonemeans the DOM diff is not applicable (e.g. error/warning outcomes where no comparison is possible).n/ameans the DOM diff was attempted but failed (e.g. metadata unavailable or could not be retrieved).error
stderr shows: total counts, unique diff counts, and timing breakdown. Proceed to Steps 2-3 for rows with .
outcome=diffUse to identify which subset of diffs to inspect. Screenshots sharing the same ID contain the same structural DOM change — pick one representative per unique ID for efficient coverage.
domDiffIdsmeticulous agent test-run-diffs --testRunId <testRunId>输出格式:标准输出为TSV格式,元数据输出至标准错误流。
标准输出列:
replayDiffId screenshotName index total outcome mismatch domDiffIds示例输出:
CqctwLpPC7 after-event-0 1 5 diff 0.00234 1;3
RRMGQft7PD after-event-174 3 8 diff 0.01050 1;2
CLkCJ8WLrJ after-event-8 2 4 diff 0.00100 none
Ct8HwmJNzM end-state 5 5 flake 0.00010 error
Ab3xKLmN9Q after-event-12 3 6 missing-base 0.00000 n/a每一行代表一张在基准版本(变更前)和当前版本(变更后)重放之间检测到视觉像素差异的截图。的行表示已确认的视觉差异;其他结果(、、、、)为信息性提示。
outcome=diffflakeerrorwarningmissing-basemissing-head- :
outcome(视觉像素差异)、diff、flake、error、warning、missing-basemissing-head - (0-1,保留5位小数)是像素不匹配比例
mismatch - 是分号分隔的有序差异ID列表,对应截图中每个独立的DOM变更。每个ID会将跨截图的相同结构DOM变更分组(同一ID表示相同结构变更)。示例:
domDiffIds表示两个独立的DOM变更,ID分别为1和3。特殊值:1;3表示未发现DOM变更(要么成功计算且无差异,要么是匹配的无预期差异截图)——视觉差异纯像素级(例如抗锯齿、渲染差异),因此必须检查截图图像以理解变更。none表示DOM差异不适用(例如无法进行比较的错误/警告结果)。n/a表示尝试获取DOM差异失败(例如元数据不可用或无法检索)。error
标准错误流会显示总数、唯一差异数和时间分解。针对的行继续执行步骤2-3。
outcome=diff使用确定需要检查的差异子集。共享同一ID的截图包含相同的DOM结构变更——为高效覆盖,每个唯一ID选择一个代表性截图即可。
domDiffIdsStep 2 -- Get screenshot images
步骤2——获取截图图像
For each representative screenshot:
meticulous agent image-files --replayDiffId <replayDiffId> --screenshotName <screenshotName>This downloads the screenshot images to and prints the local file paths.
~/.meticulous/agent-images/Output format:
outcome: <outcome>
screenshot: <path> # present for missing-base/missing-head
before: <path> # present for diff/no-diff
after: <path>
diffImage: <path>Open the , , and files to visually inspect the change. The is usually the most informative — it highlights exactly which pixels changed. Always inspect the images to understand the actual visual impact of a change, even when the DOM diff is clear.
beforeafterdiffImagediffImageAlternative: use instead of to get URLs to the images rather than downloading them locally.
image-urlsimage-files针对每个代表性截图:
meticulous agent image-files --replayDiffId <replayDiffId> --screenshotName <screenshotName>该命令会将截图图像下载至并打印本地文件路径。
~/.meticulous/agent-images/输出格式:
outcome: <outcome>
screenshot: <path> # 仅在missing-base/missing-head结果下显示
before: <path> # 仅在diff/no-diff结果下显示
after: <path>
diffImage: <path>打开、和文件以视觉检查变更。通常最具参考价值——它会高亮显示所有变更的像素。即使DOM差异清晰,也务必检查图像以理解变更的实际视觉影响。
beforeafterdiffImagediffImage替代方案:使用而非获取图像URL,而非本地下载。
image-urlsimage-filesStep 3 -- Inspect the DOM diff (for structural detail)
步骤3——检查DOM差异(获取结构细节)
meticulous agent dom-diff --replayDiffId <replayDiffId> --screenshotName <screenshotName>Optional: pass to control how many context lines surround each hunk (default 3). Use for no context, or for a single unified diff with full file context.
--context <N|full>--context 0--context fullOutput format: Unified diff (/ format) with leading indentation stripped. All diff blocks are separated by , , etc. headers. Example:
+-[diff 0][diff 1][diff 0]
<span class="text-zinc-400">#7687</span>
-<span class="min-w-0 flex-1 truncate transition-colors">Use divergence-aware comparison</span>
+<span class="min-w-0 flex-1 truncate transition-colors" data-tooltip-id=":r1h:">Use divergence-aware comparison</span>
[diff 1]
<span class="inline-flex items-center rounded-lg bg-zinc-800">Temporal Workflow</span></a>
+<a href="/projects/Foo/Bar/test-runs/abc123"><span class="inline-flex items-center rounded-lg bg-zinc-800">Original: abc123</span></a>To view a single diff block, add (maps to the position in the list from Step 1).
--index <0-based index>domDiffIdsmeticulous agent dom-diff --replayDiffId <replayDiffId> --screenshotName <screenshotName>可选:传入控制每个差异块周围的上下文行数(默认3行)。使用表示无上下文,或表示包含完整文件上下文的单一统一差异。
--context <N|full>--context 0--context full输出格式:移除前导缩进的统一差异(/格式)。所有差异块以、等标题分隔。示例:
+-[diff 0][diff 1][diff 0]
<span class="text-zinc-400">#7687</span>
-<span class="min-w-0 flex-1 truncate transition-colors">Use divergence-aware comparison</span>
+<span class="min-w-0 flex-1 truncate transition-colors" data-tooltip-id=":r1h:">Use divergence-aware comparison</span>
[diff 1]
<span class="inline-flex items-center rounded-lg bg-zinc-800">Temporal Workflow</span></a>
+<a href="/projects/Foo/Bar/test-runs/abc123"><span class="inline-flex items-center rounded-lg bg-zinc-800">Original: abc123</span></a>如需查看单个差异块,添加(对应步骤1中列表的位置)。
--index <0-based index>domDiffIdsStep 4 -- Get the replay timeline (optional, for diagnosing unexpected diffs)
步骤4——获取重放时间线(可选,用于诊断异常差异)
If a diff is unexpected and the images/DOM don't make it obvious why it happened:
meticulous agent timeline-diff --replayDiffId <replayDiffId>Output format: TSV on stdout, replay IDs on stderr.
stdout columns:
diff timeMs event description- column:
diff(identical),(removed),-(added),+(changed)! - types:
event,user,screenshot,network,console,debug,urlChange,error, etc.fatalError - : concise one-line summary of the event
description
Look for anomalies such as failed network requests, unexpected redirects, or timing-related differences that could explain a visual change.
如果差异超出预期且图像/DOM无法解释原因:
meticulous agent timeline-diff --replayDiffId <replayDiffId>输出格式:标准输出为TSV格式,重放ID输出至标准错误流。
标准输出列:
diff timeMs event description- 列:
diff(无变更)、(已移除)、-(已添加)、+(已修改)! - 类型:
event、user、screenshot、network、console、debug、urlChange、error等fatalError - :事件的简洁单行摘要
description
查找异常情况,例如失败的网络请求、意外重定向或可能解释视觉变更的时间相关差异。
Decision guide
判断指南
For each representative screenshot, classify the visual change as intended or unintended based on the diff image and DOM diff:
- Intended: The visual change is a desired outcome of the task you're working on. Confirm and move on.
- Unintended: The change was not a goal of the task. This includes both changes that are clearly unrelated to your code, and — often more importantly — side effects of your code changes that weren't meant to happen. A change being explainable by your code does not make it intended; if the task didn't call for that visual change, it's unintended.
For unintended changes:
- If the change is a side effect of your code, attempt to fix it so the code achieves the intended result without the unwanted visual change, then re-run the test.
- Use the timeline (Step 4) to check for failed network requests, redirects, or other anomalies that could explain diffs unrelated to your code.
- If you can confidently explain the cause (e.g. a flaky timestamp, a non-deterministic element), note the explanation.
- If you cannot explain or fix it, flag it to the user.
针对每个代表性截图,根据差异图像和DOM差异将视觉变更分类为预期变更或非预期变更:
- 预期变更:视觉变更是当前任务的预期结果。确认后继续即可。
- 非预期变更:变更是任务未要求的结果。包括与代码明显无关的变更,以及更重要的——代码变更带来的未预期副作用。可通过代码解释变更并不代表它是预期的;如果任务未要求该视觉变更,则属于非预期变更。
针对非预期变更:
- 如果变更是代码的副作用,尝试修复代码,使其在实现预期结果的同时避免产生不必要的视觉变更,然后重新运行测试。
- 使用时间线(步骤4)检查是否存在失败的网络请求、重定向或其他可能解释与代码无关差异的异常情况。
- 如果能明确解释原因(例如不稳定的时间戳、非确定性元素),记录该解释。
- 如果无法解释或修复,向用户标记该问题。
Final report
最终报告
After investigating all diffs and attempting fixes for any fixable issues, produce a summary that covers all significant visual changes. The number of explanation points should be at least as many as the number of unique domDiffIds you inspected — each visual change deserves its own explanation.
- Intended changes: For each distinct visual change that is a desired outcome of the task, describe what changed visually (based on the diff image) and why it's intended.
- Unintended changes (if any): For each, include:
- A representative /
replayDiffIdscreenshotName - What the visual change looks like (e.g. "new badge element added", "layout shift in header")
- Whether it's a side effect of your code or unrelated, and your best assessment of the cause
- A representative
在调查所有差异并尝试修复可解决的问题后,生成一份涵盖所有重要视觉变更的摘要。说明点的数量应至少与检查的唯一数量相同——每个视觉变更都应有单独的说明。
domDiffIds- 预期变更:针对每个作为任务预期结果的独特视觉变更,描述视觉上的变化(基于差异图像)以及变更的合理性。
- 非预期变更(如有):针对每个变更,包含:
- 代表性的/
replayDiffIdscreenshotName - 视觉变更的具体表现(例如“新增徽章元素”、“头部布局偏移”)
- 变更是代码的副作用还是无关变更,以及你对原因的最佳评估
- 代表性的