meticulous-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
To review a Meticulous test run, follow the workflow below step by step, using the CLI commands as described.
Before starting, run the
meticulous-cli-update
skill to ensure the Meticulous CLI is up to date — unless it has already run earlier in this conversation, in which case skip it.
要查看Meticulous测试运行结果,请按照以下分步工作流程操作,使用文中描述的CLI命令。
开始前,请运行
meticulous-cli-update
技能确保Meticulous CLI为最新版本——如果本次对话中之前已经运行过,则可跳过此步骤。

Prerequisites

前提条件

You need a
testRunId
for a completed Meticulous test run. Resolve it from whatever you were given:
  • A test-run ID — a 20+ character alphanumeric string (e.g.
    aB3xK9LmN7QrStUvWxYz12
    ), as opposed to a short PR number. Use it directly and skip the rest of this section.
  • A PR number — a short integer. Resolve it against the local repo by passing it to
    gh pr checks <pr-number>
    below.
  • Nothing — omit the argument;
    gh pr checks
    resolves the PR associated with the current branch in the local repo.
To infer the
testRunId
from a PR's GitHub checks (pass the PR number when you have one, omit it to use the current branch's PR):
gh pr checks <pr-number> --json name,link,state,bucket \
  --jq '.[] | select(.name | startswith("Meticulous Tests")) | select(.bucket != "pending") | select((.link // "") | contains("/test-runs/")) | {name, state, testRunId: (.link | capture("/test-runs/(?<id>[^/?#]+)").id)}'
The
testRunId
is the final path segment of the check's details URL (
app.meticulous.ai/.../test-runs/<id>
).
  • Only completed runs:
    select(.bucket != "pending")
    drops checks that are still queued or in progress, so a half-finished check can't hand you a
    testRunId
    for an incomplete run. This keeps both passing and failing runs — a
    fail
    bucket is a completed run with visual diffs, which is exactly what you're usually here to review. Don't narrow this to passing checks only.
  • Check not finished yet: the
    /test-runs/
    filter further excludes checks whose
    link
    is still
    null
    or empty (the
    // ""
    guard keeps jq from erroring on those rows). If the command returns nothing, wait and re-run, or ask the user.
  • New commit pushed: no extra steps —
    gh pr checks
    always reflects the PR's current head, so the same command returns the latest run's
    testRunId
    .
你需要已完成的Meticulous测试运行对应的
testRunId
。可根据给定信息获取:
  • 测试运行ID:一个20位以上的字母数字字符串(例如
    aB3xK9LmN7QrStUvWxYz12
    ),而非短PR编号。直接使用该ID并跳过本节剩余内容。
  • PR编号:一个短整数。通过在本地仓库中执行
    gh pr checks <pr-number>
    来解析对应的
    testRunId
  • 未提供任何信息:省略参数;
    gh pr checks
    会自动解析本地仓库当前分支关联的PR对应的
    testRunId
从PR的GitHub检查中推断
testRunId
(有PR编号则传入,无则使用当前分支的PR):
gh pr checks <pr-number> --json name,link,state,bucket \
  --jq '.[] | select(.name | startswith("Meticulous Tests")) | select(.bucket != "pending") | select((.link // "") | contains("/test-runs/")) | {name, state, testRunId: (.link | capture("/test-runs/(?<id>[^/?#]+)").id)}'
testRunId
是检查详情URL的最后一段路径(
app.meticulous.ai/.../test-runs/<id>
)。
  • 仅处理已完成的运行
    select(.bucket != "pending")
    会过滤掉仍在排队或进行中的检查,避免获取未完成运行的
    testRunId
    。该条件会保留通过和失败的运行——
    fail
    类别的运行是带有视觉差异的已完成运行,正是通常需要审查的对象。不要仅筛选通过的检查。
  • 检查尚未完成
    /test-runs/
    过滤器会进一步排除
    link
    仍为
    null
    或空的检查(
    // ""
    用于防止jq处理这些行时出错)。如果命令无返回结果,请等待后重新运行,或询问用户。
  • 推送新提交:无需额外步骤——
    gh pr checks
    始终会反映PR的当前最新状态,因此同一命令会返回最新运行的
    testRunId

Assess visual frontend changes

评估前端视觉变更

Get an overview of all diffs, then visually inspect representative screenshots to cover all changes. The
domDiffIds
from Step 1 tell you which screenshots share the same structural DOM change — pick one representative per unique diff ID to efficiently cover all changes. For each representative, always look at the screenshot images first (Step 2) — the diff image is the most informative way to understand what actually changed. Use the DOM diff (Step 3) for additional structural detail, and the timeline (Step 4) only when a diff is unexpected and not explained by the DOM or images. The final report should cover all significant visual changes: each visual change deserves its own explanation.
先获取所有差异的概览,再通过检查代表性截图覆盖所有变更。步骤1中的
domDiffIds
会告诉你哪些截图存在相同的DOM结构变更——为高效覆盖所有变更,每个唯一差异ID选择一个代表性截图即可。对于每个代表性截图,始终先查看截图图像(步骤2)——差异图像是理解实际变更内容最直观的方式。DOM差异(步骤3)用于补充结构细节,而时间线(步骤4)仅在差异超出预期且无法通过DOM或图像解释时使用。最终报告需涵盖所有重要视觉变更:每个视觉变更都应有单独的说明。

Step 1 -- Get the replay diff summary

步骤1——获取重放差异摘要

meticulous agent test-run-diffs --testRunId <testRunId>
Output format: TSV on stdout, metadata on stderr.
stdout columns:
replayDiffId	screenshotName	index	total	outcome	mismatch	domDiffIds
Example output:
CqctwLpPC7	after-event-0	1	5	diff	0.00234	1;3
RRMGQft7PD	after-event-174	3	8	diff	0.01050	1;2
CLkCJ8WLrJ	after-event-8	2	4	diff	0.00100	none
Ct8HwmJNzM	end-state	5	5	flake	0.00010	error
Ab3xKLmN9Q	after-event-12	3	6	missing-base	0.00000	n/a
Each row represents a screenshot where a visual pixel difference was detected between the base (before) and head (after) replay. Rows with
outcome=diff
are confirmed visual differences; other outcomes (
flake
,
error
,
warning
,
missing-base
,
missing-head
) are informational.
  • outcome
    :
    diff
    (visual pixel difference),
    flake
    ,
    error
    ,
    warning
    ,
    missing-base
    ,
    missing-head
  • mismatch
    (0-1, 5 decimal places) is the pixel mismatch fraction
  • domDiffIds
    is a semicolon-separated ordered list of diff IDs, one per independent DOM change in the screenshot. Each ID groups structurally identical DOM changes across screenshots (same ID = same structural change). Example:
    1;3
    means two independent DOM changes with IDs 1 and 3. Special values:
    none
    means no DOM changes were found (either computed successfully with no differences, or a matching screenshot where no diff is expected) -- the visual difference is purely pixel-level (e.g. anti-aliasing, rendering differences), so you must inspect the screenshot images to understand the change.
    n/a
    means the DOM diff is not applicable (e.g. error/warning outcomes where no comparison is possible).
    error
    means the DOM diff was attempted but failed (e.g. metadata unavailable or could not be retrieved).
stderr shows: total counts, unique diff counts, and timing breakdown. Proceed to Steps 2-3 for rows with
outcome=diff
.
Use
domDiffIds
to identify which subset of diffs to inspect. Screenshots sharing the same ID contain the same structural DOM change — pick one representative per unique ID for efficient coverage.
meticulous agent test-run-diffs --testRunId <testRunId>
输出格式:标准输出为TSV格式,元数据输出至标准错误流。
标准输出列:
replayDiffId	screenshotName	index	total	outcome	mismatch	domDiffIds
示例输出:
CqctwLpPC7	after-event-0	1	5	diff	0.00234	1;3
RRMGQft7PD	after-event-174	3	8	diff	0.01050	1;2
CLkCJ8WLrJ	after-event-8	2	4	diff	0.00100	none
Ct8HwmJNzM	end-state	5	5	flake	0.00010	error
Ab3xKLmN9Q	after-event-12	3	6	missing-base	0.00000	n/a
每一行代表一张在基准版本(变更前)和当前版本(变更后)重放之间检测到视觉像素差异的截图。
outcome=diff
的行表示已确认的视觉差异;其他结果(
flake
error
warning
missing-base
missing-head
)为信息性提示。
  • outcome
    diff
    (视觉像素差异)、
    flake
    error
    warning
    missing-base
    missing-head
  • mismatch
    (0-1,保留5位小数)是像素不匹配比例
  • domDiffIds
    是分号分隔的有序差异ID列表,对应截图中每个独立的DOM变更。每个ID会将跨截图的相同结构DOM变更分组(同一ID表示相同结构变更)。示例:
    1;3
    表示两个独立的DOM变更,ID分别为1和3。特殊值:
    none
    表示未发现DOM变更(要么成功计算且无差异,要么是匹配的无预期差异截图)——视觉差异纯像素级(例如抗锯齿、渲染差异),因此必须检查截图图像以理解变更。
    n/a
    表示DOM差异不适用(例如无法进行比较的错误/警告结果)。
    error
    表示尝试获取DOM差异失败(例如元数据不可用或无法检索)。
标准错误流会显示总数、唯一差异数和时间分解。针对
outcome=diff
的行继续执行步骤2-3。
使用
domDiffIds
确定需要检查的差异子集。共享同一ID的截图包含相同的DOM结构变更——为高效覆盖,每个唯一ID选择一个代表性截图即可。

Step 2 -- Get screenshot images

步骤2——获取截图图像

For each representative screenshot:
meticulous agent image-files --replayDiffId <replayDiffId> --screenshotName <screenshotName>
This downloads the screenshot images to
~/.meticulous/agent-images/
and prints the local file paths.
Output format:
outcome: <outcome>
screenshot: <path>          # present for missing-base/missing-head
before: <path>              # present for diff/no-diff
after: <path>
diffImage: <path>
Open the
before
,
after
, and
diffImage
files to visually inspect the change. The
diffImage
is usually the most informative — it highlights exactly which pixels changed. Always inspect the images to understand the actual visual impact of a change, even when the DOM diff is clear.
Alternative: use
image-urls
instead of
image-files
to get URLs to the images rather than downloading them locally.
针对每个代表性截图:
meticulous agent image-files --replayDiffId <replayDiffId> --screenshotName <screenshotName>
该命令会将截图图像下载至
~/.meticulous/agent-images/
并打印本地文件路径。
输出格式
outcome: <outcome>
screenshot: <path>          # 仅在missing-base/missing-head结果下显示
before: <path>              # 仅在diff/no-diff结果下显示
after: <path>
diffImage: <path>
打开
before
after
diffImage
文件以视觉检查变更。
diffImage
通常最具参考价值——它会高亮显示所有变更的像素。即使DOM差异清晰,也务必检查图像以理解变更的实际视觉影响。
替代方案:使用
image-urls
而非
image-files
获取图像URL,而非本地下载。

Step 3 -- Inspect the DOM diff (for structural detail)

步骤3——检查DOM差异(获取结构细节)

meticulous agent dom-diff --replayDiffId <replayDiffId> --screenshotName <screenshotName>
Optional: pass
--context <N|full>
to control how many context lines surround each hunk (default 3). Use
--context 0
for no context, or
--context full
for a single unified diff with full file context.
Output format: Unified diff (
+
/
-
format) with leading indentation stripped. All diff blocks are separated by
[diff 0]
,
[diff 1]
, etc. headers. Example:
[diff 0]
 <span class="text-zinc-400">#7687</span>
-<span class="min-w-0 flex-1 truncate transition-colors">Use divergence-aware comparison</span>
+<span class="min-w-0 flex-1 truncate transition-colors" data-tooltip-id=":r1h:">Use divergence-aware comparison</span>
[diff 1]
 <span class="inline-flex items-center rounded-lg bg-zinc-800">Temporal Workflow</span></a>
+<a href="/projects/Foo/Bar/test-runs/abc123"><span class="inline-flex items-center rounded-lg bg-zinc-800">Original: abc123</span></a>
To view a single diff block, add
--index <0-based index>
(maps to the position in the
domDiffIds
list from Step 1).
meticulous agent dom-diff --replayDiffId <replayDiffId> --screenshotName <screenshotName>
可选:传入
--context <N|full>
控制每个差异块周围的上下文行数(默认3行)。使用
--context 0
表示无上下文,或
--context full
表示包含完整文件上下文的单一统一差异。
输出格式:移除前导缩进的统一差异(
+
/
-
格式)。所有差异块以
[diff 0]
[diff 1]
等标题分隔。示例:
[diff 0]
 <span class="text-zinc-400">#7687</span>
-<span class="min-w-0 flex-1 truncate transition-colors">Use divergence-aware comparison</span>
+<span class="min-w-0 flex-1 truncate transition-colors" data-tooltip-id=":r1h:">Use divergence-aware comparison</span>
[diff 1]
 <span class="inline-flex items-center rounded-lg bg-zinc-800">Temporal Workflow</span></a>
+<a href="/projects/Foo/Bar/test-runs/abc123"><span class="inline-flex items-center rounded-lg bg-zinc-800">Original: abc123</span></a>
如需查看单个差异块,添加
--index <0-based index>
(对应步骤1中
domDiffIds
列表的位置)。

Step 4 -- Get the replay timeline (optional, for diagnosing unexpected diffs)

步骤4——获取重放时间线(可选,用于诊断异常差异)

If a diff is unexpected and the images/DOM don't make it obvious why it happened:
meticulous agent timeline-diff --replayDiffId <replayDiffId>
Output format: TSV on stdout, replay IDs on stderr.
stdout columns:
diff	timeMs	event	description
  • diff
    column:
     
    (identical),
    -
    (removed),
    +
    (added),
    !
    (changed)
  • event
    types:
    user
    ,
    screenshot
    ,
    network
    ,
    console
    ,
    debug
    ,
    urlChange
    ,
    error
    ,
    fatalError
    , etc.
  • description
    : concise one-line summary of the event
Look for anomalies such as failed network requests, unexpected redirects, or timing-related differences that could explain a visual change.
如果差异超出预期且图像/DOM无法解释原因:
meticulous agent timeline-diff --replayDiffId <replayDiffId>
输出格式:标准输出为TSV格式,重放ID输出至标准错误流。
标准输出列:
diff	timeMs	event	description
  • diff
    列:
     
    (无变更)、
    -
    (已移除)、
    +
    (已添加)、
    !
    (已修改)
  • event
    类型:
    user
    screenshot
    network
    console
    debug
    urlChange
    error
    fatalError
  • description
    :事件的简洁单行摘要
查找异常情况,例如失败的网络请求、意外重定向或可能解释视觉变更的时间相关差异。

Decision guide

判断指南

For each representative screenshot, classify the visual change as intended or unintended based on the diff image and DOM diff:
  • Intended: The visual change is a desired outcome of the task you're working on. Confirm and move on.
  • Unintended: The change was not a goal of the task. This includes both changes that are clearly unrelated to your code, and — often more importantly — side effects of your code changes that weren't meant to happen. A change being explainable by your code does not make it intended; if the task didn't call for that visual change, it's unintended.
For unintended changes:
  1. If the change is a side effect of your code, attempt to fix it so the code achieves the intended result without the unwanted visual change, then re-run the test.
  2. Use the timeline (Step 4) to check for failed network requests, redirects, or other anomalies that could explain diffs unrelated to your code.
  3. If you can confidently explain the cause (e.g. a flaky timestamp, a non-deterministic element), note the explanation.
  4. If you cannot explain or fix it, flag it to the user.
针对每个代表性截图,根据差异图像和DOM差异将视觉变更分类为预期变更非预期变更
  • 预期变更:视觉变更是当前任务的预期结果。确认后继续即可。
  • 非预期变更:变更是任务未要求的结果。包括与代码明显无关的变更,以及更重要的——代码变更带来的未预期副作用。可通过代码解释变更并不代表它是预期的;如果任务未要求该视觉变更,则属于非预期变更。
针对非预期变更:
  1. 如果变更是代码的副作用,尝试修复代码,使其在实现预期结果的同时避免产生不必要的视觉变更,然后重新运行测试。
  2. 使用时间线(步骤4)检查是否存在失败的网络请求、重定向或其他可能解释与代码无关差异的异常情况。
  3. 如果能明确解释原因(例如不稳定的时间戳、非确定性元素),记录该解释。
  4. 如果无法解释或修复,向用户标记该问题。

Final report

最终报告

After investigating all diffs and attempting fixes for any fixable issues, produce a summary that covers all significant visual changes. The number of explanation points should be at least as many as the number of unique domDiffIds you inspected — each visual change deserves its own explanation.
  1. Intended changes: For each distinct visual change that is a desired outcome of the task, describe what changed visually (based on the diff image) and why it's intended.
  2. Unintended changes (if any): For each, include:
    • A representative
      replayDiffId
      /
      screenshotName
    • What the visual change looks like (e.g. "new badge element added", "layout shift in header")
    • Whether it's a side effect of your code or unrelated, and your best assessment of the cause
在调查所有差异并尝试修复可解决的问题后,生成一份涵盖所有重要视觉变更的摘要。说明点的数量应至少与检查的唯一
domDiffIds
数量相同——每个视觉变更都应有单独的说明。
  1. 预期变更:针对每个作为任务预期结果的独特视觉变更,描述视觉上的变化(基于差异图像)以及变更的合理性。
  2. 非预期变更(如有):针对每个变更,包含:
    • 代表性的
      replayDiffId
      /
      screenshotName
    • 视觉变更的具体表现(例如“新增徽章元素”、“头部布局偏移”)
    • 变更是代码的副作用还是无关变更,以及你对原因的最佳评估