triaging-visual-review-runs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTriaging visual review runs
梳理视觉审查运行
Visual Review is PostHog's screenshot-regression product: CI captures storybook + playwright screenshots,
diffs them against committed baseline hashes, and gates the PR until a human approves the visible changes.
A PR with visual changes carries a GitHub status check that stays red until each diffed
snapshot is approved or tolerated in the VR UI.
visual-reviewThis skill teaches an agent how to answer the questions a human reviewer would actually ask, by chaining
the read-only VR MCP tools — instead of reaching for and tab-hopping to the VR web UI.
gh pr viewVisual Review是PostHog的截图回归检测产品:CI会捕获storybook + playwright截图,将其与已提交的基线哈希值进行对比,直到人工批准可见变更后才允许PR合并。带有视觉变更的PR会带有 GitHub状态检查标记,在VR UI中每个差异快照被批准或标记为可容忍前,该标记会保持红色。
visual-review本技能教Agent如何通过链式调用只读的VR MCP工具,回答人工评审员实际会提出的问题——无需使用并在VR网页UI间来回切换。
gh pr viewWhen this skill applies
本技能的适用场景
Trigger this skill on any of:
- A PR number, branch name, or commit SHA paired with words like visual review, VR, snapshot, screenshot, storybook diff, playwright snapshot, baseline, approve, tolerated, quarantine.
- Questions about why a PR is blocked, what visually changed, or whether a diff is real.
- "Is my run done?" / "What's left to review?" / "Has this story flaked recently?"
- A failing GitHub check or a PR comment from the
visual-reviewmentioning visual review.posthog-bot
When the user asks for the rendered diff image itself, the VR web UI
is faster — direct them there. This skill is for everything around the diff: status, scope, history, triage.
在以下任意场景触发本技能:
- PR编号、分支名称或提交SHA搭配以下词汇:visual review、VR、snapshot、screenshot、storybook diff、playwright snapshot、baseline、approve、tolerated、quarantine。
- 询问PR为何被阻塞、视觉上有哪些变化,或是差异是否为真实问题。
- "我的运行完成了吗?" / "还有哪些需要评审?" / "这个story最近偶发故障了吗?"
- GitHub检查失败,或是
visual-review的PR评论提及视觉审查。posthog-bot
当用户要求查看渲染后的差异图像本身时,VR网页UI速度更快——直接引导用户前往该处。本技能负责处理差异之外的所有事项:状态、范围、历史记录、梳理工作。
Tools
工具
All read-only. None of these require write scopes; approval/toleration still happens in the web UI.
| Tool | Purpose |
|---|---|
| List runs, filter by |
| Full detail for a single run (status, summary counts, supersession). |
| Per-snapshot results inside a run: identifier, |
| A single story's last N runs across master/PRs — the flake check. |
| Aggregate counts for queue triage (how many runs in |
| Hashes the team has explicitly accepted as "known flake / acceptable variation". |
| Repos (one per GitHub repo) — usually only one matters; useful for filtering. |
| Repo metadata: baseline file paths, PR-comment configuration. |
所有工具均为只读。这些工具均无需写入权限;批准/标记为可容忍仍需在网页UI中操作。
| 工具名称 | 用途 |
|---|---|
| 列出运行记录,可按 |
| 获取单个运行记录的完整详情(状态、汇总统计、是否被取代)。 |
| 获取某一运行记录内的每个快照结果:标识符、 |
| 获取单个story在master/PR上的最近N次运行记录——用于偶发故障检查。 |
| 获取队列梳理的汇总统计(如处于 |
| 获取团队明确标记为“已知偶发故障/可接受差异”的哈希值。 |
| 仓库列表(每个GitHub仓库对应一个)——通常只需关注一个;适用于筛选。 |
| 获取仓库元数据:基线文件路径、PR评论配置。 |
Vocabulary cheat sheet
词汇速查
These appear in tool output and matter for interpretation:
- Run :
review_state(open, awaiting human),needs_review(zero diffs),clean(CI still uploading),processing(a newer run on the same PR has superseded this one — checkstale).superseded_by_id - Run :
run_type(component snapshots) orstorybook(full-page e2e snapshots).playwright - Snapshot :
result,unchanged(real diff),changed(no baseline yet),new.removed - Snapshot :
classification_reason(matches a known-tolerated hash, no action needed),tolerated_hash(under the noise floor),below_threshold(byte-identical),exact(real diff requiring review)."" - Snapshot :
review_stateorpending.approved - Run :
summary—total / changed / new / removed / unchanged / unresolved / tolerated_matchedis what's actually blocking review.unresolved
这些词汇会出现在工具输出中,对解读结果至关重要:
- 运行记录:
review_state(未完成,等待人工处理)、needs_review(无差异)、clean(CI仍在上传中)、processing(同一PR上的较新运行记录已取代该记录——查看stale)。superseded_by_id - 运行记录:
run_type(组件快照)或storybook(全页端到端快照)。playwright - 快照:
result(无变化)、unchanged(真实差异)、changed(尚无基线)、new(已移除)。removed - 快照:
classification_reason(匹配已知可容忍哈希值,无需操作)、tolerated_hash(低于噪声阈值)、below_threshold(字节级完全一致)、exact(需要评审的真实差异)。"" - 快照:
review_state(待处理)或pending(已批准)。approved - 运行记录:
summary——total / changed / new / removed / unchanged / unresolved / tolerated_matched是实际阻塞评审的内容。unresolved
Workflows
工作流程
"What's the VR status of this PR?"
“这个PR的VR状态如何?”
The single most common job. Map a PR number to its run state in two calls.
- — sort by
posthog:visual-review-runs-list { pr_number: <n>, limit: 5 }desc, take the latest non-stale one.created_at - If the run has or
summary.changed > 0, drill in:summary.unresolved > 0and report theposthog:visual-review-runs-snapshots-list { id: <run_id> }snapshots.changed
Report back: PR number, run UUID, , summary counts, and the deep link so the
user can click straight to the diff viewer.
review_state_posthogUrl这是最常见的需求。通过两次调用将PR编号映射到其运行状态。
- —— 按
posthog:visual-review-runs-list { pr_number: <n>, limit: 5 }降序排序,取最新的非过时记录。created_at - 如果运行记录的或
summary.changed > 0,深入查看:summary.unresolved > 0并报告posthog:visual-review-runs-snapshots-list { id: <run_id> }快照。changed
反馈内容:PR编号、运行记录UUID、、汇总统计,以及深度链接,方便用户直接点击进入差异查看器。
review_state_posthogUrl"Is the diff real or unrelated?"
“这个差异是真实问题还是无关变化?”
The most useful judgment a code-aware agent can add. Combine three signals: scope match, flake history,
and the actual rendered images. The agent should look at the screenshots — not just describe metadata.
-
Scope check —(or against the PR's base branch) → list of touched paths. Cross-reference with
git diff master...HEAD --statfiltered toposthog:visual-review-runs-snapshots-list { id }→ story identifiers. Stories are namespaced likeresult: changed; e.g.<area>-<scene>--<story>--<theme>maps toscenes-app-settings-user--settings-user-profile--dark. Use this to translate story id → likely source path.frontend/src/scenes/settings/user/... -
Visual inspection — for eachsnapshot, the tool result contains
changedandcurrent_artifact.download_url. These are pre-signed S3 URLs to PNG files; pull them and look:baseline_artifact.download_urlbashcurl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>" curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"Thenboth files (the Read tool renders images visually) and compare. Things to call out:Read- The actual visible delta (text changed, button moved, layout shift, color drift, missing element).
- Whether the change is consistent with the diff_pixel_count and diff_percentage in the metadata (e.g. 54% diff but the images look near-identical → screenshot framing changed, not the UI).
- Whether the baseline and current have different dimensions (/
widthfields). Mismatched dimensions usually mean the story rendered to a different viewport or didn't fully render before screenshot — a flake signal, not a regression.height
-
Flake history — run the flake check below for any story that looks suspect.
-
Verdict — combine all three:
- Scope plausible + visible regression matches the code change → real diff, recommend approval.
- Scope mismatch + dimensions mismatch + frequent prior changes → flake, recommend tolerating the hash.
- Scope plausible + visible regression looks unintended → push a fix; do not approve.
Always include a one-line description of what you saw in the images — the user uses this to decide whether to
trust your verdict without opening the VR UI themselves.
这是具备代码认知的Agent能提供的最有价值判断。结合三个信号:范围匹配、偶发故障历史和实际渲染图像。Agent应查看截图——而不只是描述元数据。
-
范围检查 ——(或与PR的基准分支对比)→ 列出改动路径。 与
git diff master...HEAD --stat过滤出的posthog:visual-review-runs-snapshots-list { id }结果交叉对比→ story标识符。 Story采用命名空间格式,如result: changed;例如<area>-<scene>--<story>--<theme>对应scenes-app-settings-user--settings-user-profile--dark。利用这一点将story id转换为可能的源码路径。frontend/src/scenes/settings/user/... -
视觉检查 —— 对于每个快照,工具结果包含
changed和current_artifact.download_url。这些是预签名的S3 URL,指向PNG文件;拉取并查看:baseline_artifact.download_urlbashcurl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>" curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"然后使用Read工具读取两个文件(该工具可可视化渲染图像)并进行对比。需要指出以下内容:- 实际可见的差异(文本变更、按钮移动、布局偏移、颜色偏差、元素缺失)。
- 变更是否与元数据中的和
diff_pixel_count一致 (例如差异率54%但图像几乎相同→截图取景范围变更,而非UI变更)。diff_percentage - 基线和当前图像的尺寸(/
width字段)是否不同。尺寸不匹配通常意味着story渲染到了不同视口, 或是在截图前未完全渲染——这是偶发故障的信号,而非回归问题。height
-
偶发故障历史 —— 对任何看起来可疑的story,运行下方的偶发故障检查。
-
结论 —— 结合以上三点:
- 范围合理 + 可见回归与代码变更一致→真实差异,建议批准。
- 范围不匹配 + 尺寸不匹配 + 之前频繁变更→偶发故障,建议通过UI标记该哈希值为可容忍。
- 范围合理 + 可见回归看起来是意外变更→推送修复;不要批准。
务必包含一行对图像中所见内容的描述——用户可据此决定是否信任你的结论,无需打开VR UI。
Flake check: "Has this story been changing?"
偶发故障检查:“这个story是否一直在变化?”
Once you have a suspect snapshot identifier:
posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> }Verdicts:
- Mostly and this run's diff is the outlier → likely a real regression caused by this PR.
unchanged - Frequent across unrelated branches/master → flaky story; recommend tolerating the hash via the UI.
changed - Recent or large-jump dimension change → baseline likely stale; recommend re-baselining on master.
removed
拿到可疑的快照标识符后:
posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> }结论:
- 大部分为,本次运行的差异是个例→可能是本次PR导致的真实回归。
unchanged - 在无关分支/master上频繁出现→story存在偶发故障;建议通过UI标记该哈希值为可容忍。
changed - 最近出现或尺寸大幅变化→基线可能过时;建议在master上重新生成基线。
removed
Triaging the queue
梳理队列
When the user is doing housekeeping rather than asking about a specific PR:
- → total queue size.
posthog:visual-review-runs-counts-retrieve - (paginate if needed).
posthog:visual-review-runs-list { review_state: needs_review, limit: 50 } - Group by author or
branchto surface clusters (e.g., "12 PRs blocked on the same shared component change" usually means a single underlying root cause to address).run_type - Prefer surfacing runs whose over runs that are only
summary.changed > 0—newmeans no baseline yet, which is usually trivial to approve;newis the real review work.changed
当用户进行内务处理而非询问特定PR时:
- → 获取队列总大小。
posthog:visual-review-runs-counts-retrieve - (如有需要可分页)。
posthog:visual-review-runs-list { review_state: needs_review, limit: 50 } - 按作者或
branch分组,找出集群问题(例如“12个PR因同一共享组件变更而阻塞”通常意味着存在一个需要解决的根本原因)。run_type - 优先展示的运行记录,而非仅
summary.changed > 0的记录——new意味着尚无基线,通常只需简单批准;new才是真正需要评审的工作。changed
Output expectations
输出预期
For PR-status questions, lead with the verdict in one line, then 2-4 bullets of supporting context. Always
include the deep link to the run — humans need to see the rendered images to make the call,
the agent can only describe the metadata.
_posthogUrlFor triage / aggregate questions, a short table beats prose. Group by what the user is going to act on.
对于PR状态查询,先以一行结论开头,再用2-4个项目符号列出支持性上下文。务必包含运行记录的深度链接——人类需要查看渲染图像才能做出决定,Agent只能描述元数据。
_posthogUrl对于梳理/汇总查询,简短的表格比散文式描述更合适。按用户将要采取的操作进行分组。
What NOT to do
禁止操作
- Do not approve or tolerate snapshots from this skill — those endpoints are intentionally not exposed as
MCP tools yet. Direct the user to the run's .
_posthogUrl - Do not assume the failing GitHub check on a PR is unrelated to VR — if a check is red on a PR you're working on, that's the trigger to run this skill.
visual-review - Do not declare a verdict from metadata alone when . Pull the baseline and current PNGs and look at them; metadata can only say "something changed", not whether the change is intended.
result: changed
- 不要通过本技能批准或标记快照为可容忍——这些端点尚未作为MCP工具开放。引导用户前往运行记录的。
_posthogUrl - 不要假设PR上失败的GitHub检查与VR无关——如果你正在处理的PR上检查标记为红色,这就是触发本技能的信号。
visual-review - 当时,不要仅根据元数据就得出结论。拉取基线和当前PNG图像并查看;元数据只能说明“有东西变了”,无法判断变更是否是预期的。",
result: changed